Ambiguity and uniqueness of correct answers in MedQA (USMLE) questions

Ascertain whether MedQA (USMLE) questions that request the "best," "most likely," or "most appropriate" option truly admit a unique correct answer, or whether multiple answer options can simultaneously be acceptable for a given case.

Background

MedQA (USMLE) is a widely used benchmark of multiple-choice questions designed to test clinical knowledge and reasoning. The authors relabeled the test set with multiple U.S.-based physicians to assess data quality, including missing information, label errors, and potential ambiguity.

During analysis, the authors note that many MedQA questions are phrased to select the "best," "most likely," or "most appropriate" option. Their findings highlight uncertainty over whether such questions necessarily have a single uniquely correct answer, raising concerns about ambiguity and its implications for evaluating model performance and rater agreement.

This unresolved issue affects how accuracy should be interpreted on MedQA and suggests the need for methods to determine when multiple answers may be justified for a given clinical scenario.

References

However, it is largely unclear whether answers do indeed only allow for one option to be e.g. the ``best next step in management'' of a case.

Capabilities of Gemini Models in Medicine (2404.18416 - Saab et al., 29 Apr 2024) in Appendix C, Subsection 'MedQA (USMLE) Relabeling'