Impact of prompt reading level on ChatGPT response accuracy

Ascertain how the reading level of the user’s prompt influences the accuracy of responses generated by ChatGPT (GPT3.5, GPT4, and GPT4o-mini) to statistics questions, and quantify the relationship between prompt readability and correctness.

Background

The study measured readability (Flesch–Kincaid and SMOG) of AI-generated answers and illustrated that prompting at different reading levels changes the readability of the output. However, the authors note that the effect of prompt reading level on answer accuracy has not been quantified in this work.

They explicitly identify this as a topic for future investigation, linking it to practical classroom usage where students may request simpler explanations that could affect correctness.

References

Certainly the effect of reading level of the prompt on accuracy of the response is another issue to be addressed in future work.

— Generative AI Takes a Statistics Exam: A Comparison of Performance between ChatGPT3.5, ChatGPT4, and ChatGPT4o-mini (2501.09171 - McGee et al., 15 Jan 2025) in Section 6, Discussion and Conclusion

Impact of prompt reading level on ChatGPT response accuracy

Background

References

Related Problems