Effect of prompt engineering on accuracy, readability, and topic modeling of ChatGPT responses

Determine how adding contextual information or altering prompt wording (prompt engineering) affects the accuracy of answers, Flesch–Kincaid and SMOG readability indices, and Latent Dirichlet Allocation topic-model outputs for responses generated by ChatGPT versions GPT3.5, GPT4, and GPT4o-mini to graduate-level statistics exam questions.

Background

Prior work cited in the paper shows that GPT3.5’s accuracy can improve when users provide context or adjust prompts, and that small wording changes can affect outcomes. In the present paper, the authors used a zero-shot approach—entering exam questions verbatim—specifically to mimic typical student usage without prompt optimization.

Because they did not vary context or wording, the authors state they cannot assess how such prompt engineering would change their measured outcomes (accuracy, readability, topic modeling). They explicitly reserve investigating prompt engineering frameworks (e.g., Tree of Thoughts) and their effects on text analytics across generative AI platforms for future research.

References

In the current paper, we did not change any wording of the question in order to mimic how we thought students might use ChatGPT to get help on homework questions; therefore, we cannot comment on how providing context or changing question wording will affect our results on accuracy, reading level, or topic modeling. We reserve an investigation of the effect of “prompt engineering”, including different frameworks of prompt engineering, on text analytics from output for various generative AI platforms for future research.

— Generative AI Takes a Statistics Exam: A Comparison of Performance between ChatGPT3.5, ChatGPT4, and ChatGPT4o-mini (2501.09171 - McGee et al., 15 Jan 2025) in Section 2, Previous Research on Differences in Performance of OpenAI Platforms

Effect of prompt engineering on accuracy, readability, and topic modeling of ChatGPT responses

Sponsor

Background

References

Related Problems