Dice Question Streamline Icon: https://streamlinehq.com

Effect of sampling temperature on OCR and grading performance

Investigate how the choice of sampling temperature affects the performance of the GPT-4V/GPT-4-based AI-assisted grading pipeline for handwritten mathematics exams, and determine optimal temperature settings with respect to recognition rates, grading accuracy, and reliability.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper experimented with different temperatures for OCR and grading (e.g., GPT-4V at T=0 and T=1 for pre-extracted boxes; T=0.7 for whole-page OCR; sampling-based grading at T=0.7), observing differences in recognition of short mathematical expressions and overall pipeline behavior. However, the computational cost prevented a systematic exploration of temperature effects.

Understanding the influence of temperature is important because it directly impacts both variability and potential accuracy of OCR transcriptions and grading outputs, and thus any derived confidence measures based on sampling.

References

Due to the high computational effort involved, the effect to the temperature could not be fully explored.

AI-assisted Automated Short Answer Grading of Handwritten University Level Mathematics Exams (2408.11728 - Liu et al., 21 Aug 2024) in Section 6 (Limitations)