Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Large Language Model Approach to Educational Survey Feedback Analysis (2309.17447v2)

Published 29 Sep 2023 in cs.CL

Abstract: This paper assesses the potential for the LLMs GPT-4 and GPT-3.5 to aid in deriving insight from education feedback surveys. Exploration of LLM use cases in education has focused on teaching and learning, with less exploration of capabilities in education feedback analysis. Survey analysis in education involves goals such as finding gaps in curricula or evaluating teachers, often requiring time-consuming manual processing of textual responses. LLMs have the potential to provide a flexible means of achieving these goals without specialized machine learning models or fine-tuning. We demonstrate a versatile approach to such goals by treating them as sequences of NLP tasks including classification (multi-label, multi-class, and binary), extraction, thematic analysis, and sentiment analysis, each performed by LLM. We apply these workflows to a real-world dataset of 2500 end-of-course survey comments from biomedical science courses, and evaluate a zero-shot approach (i.e., requiring no examples or labeled training data) across all tasks, reflecting education settings, where labeled data is often scarce. By applying effective prompting practices, we achieve human-level performance on multiple tasks with GPT-4, enabling workflows necessary to achieve typical goals. We also show the potential of inspecting LLMs' chain-of-thought (CoT) reasoning for providing insight that may foster confidence in practice. Moreover, this study features development of a versatile set of classification categories, suitable for various course types (online, hybrid, or in-person) and amenable to customization. Our results suggest that LLMs can be used to derive a range of insights from survey text.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Alhija, F.N.-A., Fresko, B.: Student evaluation of instruction: What can be learned from students’ written comments? Studies in Educational Evaluation 35(1), 37–44 (2009) https://doi.org/10.1016/j.stueduc.2009.01.002 Onan [2021] Onan, A.: Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach. Comput. Appl. Eng. Educ. 29(3), 572–589 (2021) https://doi.org/10.1002/cae.22253 Aldeman and Branoff [2021] Aldeman, M., Branoff, T.J.: Impact of course modality on student course evaluations. In: 2021 ASEE Virtual Annual Conference Content Access. ASEE Conferences, Virtual Conference (2021). https://peer.asee.org/37275.pdf Veselovsky et al. [2023] Veselovsky, V., Ribeiro, M.H., West, R.: Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks (2023) arXiv:2306.07899 [cs.CL] Onan [2021] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Onan, A.: Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach. Comput. Appl. Eng. Educ. 29(3), 572–589 (2021) https://doi.org/10.1002/cae.22253 Aldeman and Branoff [2021] Aldeman, M., Branoff, T.J.: Impact of course modality on student course evaluations. In: 2021 ASEE Virtual Annual Conference Content Access. ASEE Conferences, Virtual Conference (2021). https://peer.asee.org/37275.pdf Veselovsky et al. [2023] Veselovsky, V., Ribeiro, M.H., West, R.: Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks (2023) arXiv:2306.07899 [cs.CL] Onan [2021] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Aldeman, M., Branoff, T.J.: Impact of course modality on student course evaluations. In: 2021 ASEE Virtual Annual Conference Content Access. ASEE Conferences, Virtual Conference (2021). https://peer.asee.org/37275.pdf Veselovsky et al. [2023] Veselovsky, V., Ribeiro, M.H., West, R.: Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks (2023) arXiv:2306.07899 [cs.CL] Onan [2021] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Veselovsky, V., Ribeiro, M.H., West, R.: Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks (2023) arXiv:2306.07899 [cs.CL] Onan [2021] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  2. Onan, A.: Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach. Comput. Appl. Eng. Educ. 29(3), 572–589 (2021) https://doi.org/10.1002/cae.22253 Aldeman and Branoff [2021] Aldeman, M., Branoff, T.J.: Impact of course modality on student course evaluations. In: 2021 ASEE Virtual Annual Conference Content Access. ASEE Conferences, Virtual Conference (2021). https://peer.asee.org/37275.pdf Veselovsky et al. [2023] Veselovsky, V., Ribeiro, M.H., West, R.: Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks (2023) arXiv:2306.07899 [cs.CL] Onan [2021] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Aldeman, M., Branoff, T.J.: Impact of course modality on student course evaluations. In: 2021 ASEE Virtual Annual Conference Content Access. ASEE Conferences, Virtual Conference (2021). https://peer.asee.org/37275.pdf Veselovsky et al. [2023] Veselovsky, V., Ribeiro, M.H., West, R.: Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks (2023) arXiv:2306.07899 [cs.CL] Onan [2021] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Veselovsky, V., Ribeiro, M.H., West, R.: Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks (2023) arXiv:2306.07899 [cs.CL] Onan [2021] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  3. Aldeman, M., Branoff, T.J.: Impact of course modality on student course evaluations. In: 2021 ASEE Virtual Annual Conference Content Access. ASEE Conferences, Virtual Conference (2021). https://peer.asee.org/37275.pdf Veselovsky et al. [2023] Veselovsky, V., Ribeiro, M.H., West, R.: Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks (2023) arXiv:2306.07899 [cs.CL] Onan [2021] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Veselovsky, V., Ribeiro, M.H., West, R.: Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks (2023) arXiv:2306.07899 [cs.CL] Onan [2021] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  4. Veselovsky, V., Ribeiro, M.H., West, R.: Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks (2023) arXiv:2306.07899 [cs.CL] Onan [2021] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  5. Onan, A.: Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurr. Comput. 33(23) (2021) https://doi.org/10.1002/cpe.5909 Devlin et al. [2018] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  6. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018) arXiv:1810.04805 [cs.CL] Deepa et al. [2019] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  7. Deepa, D., Raaji, Tamilarasi, A.: Sentiment analysis using feature extraction and Dictionary-Based approaches. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032456 Zhang et al. [2020] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  8. Zhang, H., Dong, J., Min, L., Bi, P.: A BERT fine-tuning model for targeted sentiment analysis of Chinese online course reviews. Int. J. Artif. Intell. Tools 29(07n08), 2040018 (2020) https://doi.org/10.1142/S0218213020400187 Unankard and Nadee [2020] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  9. Unankard, S., Nadee, W.: Topic detection for online course feedback using lda. In: Popescu, E., Hao, T., Hsu, T.C., Xie, H., Temperini, M., Chen, W. (eds.) Emerging Technologies for Education. SETE 2019. Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 11984. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38778-5_16 Cunningham-Nelson et al. [2019] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  10. Cunningham-Nelson, S., Baktashmotlagh, M., Boles, W.: Visualizing student opinion through text analysis. IEEE Trans. Educ. 62(4), 305–311 (2019) https://doi.org/10.1109/TE.2019.2924385 Perez-Encinas and Rodriguez-Pomeda [2018] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  11. Perez-Encinas, A., Rodriguez-Pomeda, J.: International students’ perceptions of their needs when going abroad: Services on demand. Journal of Studies in International Education 22(1), 20–36 (2018) https://doi.org/10.1177/1028315317724556 Sindhu et al. [2019] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  12. Sindhu, I., Muhammad Daudpota, S., Badar, K., Bakhtyar, M., Baber, J., Nurunnabi, M.: Aspect-Based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7, 108729–108741 (2019) https://doi.org/10.1109/ACCESS.2019.2928872 Sutoyo et al. [2021] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  13. Sutoyo, E., Almaarif, A., Yanto, I.T.R.: Sentiment analysis of student evaluations of teaching using deep learning approach. In: International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020), pp. 272–281. Springer, Uyo, Akwa Ibom State, Nigeria (2021). https://doi.org/10.1007/978-3-030-80216-5_20 Meidinger and Aßenmacher [2021] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  14. Meidinger, M., Aßenmacher, M.: A new benchmark for NLP in social sciences: Evaluating the usefulness of pre-trained language models for classifying open-ended survey responses. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Online (2021). https://doi.org/10.5220/0010255108660873 [16] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  15. Papers with Code - Machine Learning Datasets. https://paperswithcode.com/datasets?task=text-classification. Accessed: 2023-8-21 [17] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  16. Hugging Face – The AI community building the future. https://huggingface.co/datasets?task_categories=task_categories:zero-shot-classification&sort=trending. Accessed: 2023-8-21 Kastrati et al. [2020a] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  17. Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for Aspect-Based sentiment analysis on students’ reviews of MOOCs. IEEE Access 8, 106799–106810 (2020) https://doi.org/10.1109/ACCESS.2020.3000739 Kastrati et al. [2020b] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  18. Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-Based opinion mining of students’ reviews on online courses. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence. ICCAI ’20, pp. 510–514. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404555.3404633 Shaik et al. [2022] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  19. Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., Galligan, L.: A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access 10, 56720–56739 (2022) https://doi.org/10.1109/ACCESS.2022.3177752 Edalati et al. [2022] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  20. Edalati, M., Imran, A.S., Kastrati, Z., Daudpota, S.M.: The potential of machine learning algorithms for sentiment classification of students’ feedback on MOOC. In: Intelligent Systems and Applications, pp. 11–22. Springer, Amsterdam (2022). https://doi.org/10.1007/978-3-030-82199-9_2 Reimers and Gurevych [2019] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  21. Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks (2019) arXiv:1908.10084 [cs.CL] Törnberg [2023] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  22. Törnberg, P.: ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with Zero-Shot learning (2023) arXiv:2304.06588 [cs.CL] Jansen et al. [2023] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  23. Jansen, B.J., Jung, S.-G., Salminen, J.: Employing large language models in survey research. Natural Language Processing Journal 4, 100020 (2023) Masala et al. [2021] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  24. Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and clustering main ideas from student feedback using language models. In: Artificial Intelligence in Education: 22nd International Conference, AIED, Proceedings, Part I, pp. 282–292. Springer, Utrecht, The Netherlands (2021). https://doi.org/10.1007/978-3-030-78292-4_23 Reiss [2023] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  25. Reiss, M.V.: Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark (2023) arXiv:2304.11085 [cs.CL] Pangakis et al. [2023] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  26. Pangakis, N., Wolken, S., Fasching, N.: Automated annotation with generative AI requires validation (2023) arXiv:2306.00176 [cs.CL] Gilardi et al. [2023] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  27. Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT outperforms Crowd-Workers for Text-Annotation tasks (2023) arXiv:2303.15056 [cs.CL] Huang et al. [2023] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  28. Huang, F., Kwak, H., An, J.: Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech (2023) arXiv:2302.07736 [cs.CL] [30] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  29. Best Practices and Sample Questions for Course Evaluation Surveys. https://assessment.wisc.edu/best-practices-and-sample-questions-for-course-evaluation-surveys/. Accessed: 2023-8-21 Medina et al. [2019] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  30. Medina, M.S., Smith, W.T., Kolluru, S., Sheaffer, E.A., DiVall, M.: A review of strategies for designing, administering, and using student ratings of instruction. Am. J. Pharm. Educ. 83(5), 7177 (2019) https://doi.org/10.5688/ajpe7177 [32] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  31. Course Evaluations Question Bank. https://teaching.berkeley.edu/course-evaluations-question-bank. Accessed: 2023-8-21 Kojima et al. [2022] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  32. Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are Zero-Shot reasoners (2022) arXiv:2205.11916 [cs.CL] Wei et al. [2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  33. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models, 24824–24837 (2022) arXiv:2201.11903 [cs.CL] Braun and Clarke [2006] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  34. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) https://doi.org/10.1191/1478088706qp063oa Reynolds and McDonell [2021] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  35. Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the Few-Shot paradigm (2021) arXiv:2102.07350 [cs.CL] White et al. [2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  36. White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with ChatGPT (2023) arXiv:2302.11382 [cs.SE] Tunstall et al. [2022] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  37. Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., Pereg, O.: Efficient few-shot learning without prompts (2022) arXiv:2209.11055 [cs.CL] [39] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  38. cardiffnlp/twitter-roberta-base-sentiment-latest - Hugging Face. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest. Accessed: 2023-8-21 (2022) Loureiro et al. [2022] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  39. Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMs: Diachronic language models from twitter (2022) arXiv:2202.03829 [cs.CL] Chen et al. [2023] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  40. Chen, L., Zaharia, M., Zou, J.: How is ChatGPT’s behavior changing over time? (2023) arXiv:2307.09009 [cs.CL] Kıcıman et al. [2023] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  41. Kıcıman, E., Ness, R., Sharma, A., Tan, C.: Causal reasoning and large language models: Opening a new frontier for causality (2023) arXiv:2305.00050 [cs.AI] Peng et al. [2023] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  42. Peng, B., Li, C., He, P., Galley, M., Gao, J.: Instruction tuning with GPT-4 (2023) arXiv:2304.03277 [cs.CL] Wang et al. [2022] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  43. Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-Consistency improves chain of thought reasoning in language models (2022) arXiv:2203.11171 [cs.CL] Madaan et al. [2023] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  44. Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B.P., Hermann, K., Welleck, S., Yazdanbakhsh, A., Clark, P.: Self-Refine: Iterative refinement with Self-Feedback (2023) arXiv:2303.17651 [cs.CL] Weng [2023] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  45. Weng, L.: LLM Powered Autonomous Agents. https://lilianweng.github.io/posts/2023-06-23-agent/. Accessed: 2023-8-21 (2023) Yao et al. [2022] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  46. Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models (2022) arXiv:2210.03629 [cs.CL] Shen et al. [2023] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL] Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
  47. Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face (2023) arXiv:2303.17580 [cs.CL]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Michael J. Parker (1 paper)
  2. Caitlin Anderson (1 paper)
  3. Claire Stone (1 paper)
  4. YeaRim Oh (1 paper)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets