Source and Integration of Human Preferences for RLHF in Educational Tutoring

Ascertain whether human preference signals for training reward models in reinforcement learning from human feedback (RLHF) for educational tutors should be elicited from learners, educators, or both, and establish principled methods for combining learner and educator preferences when both are used.

Background

The authors discuss extending RLHF to pedagogical settings and highlight the difficulty of reliably eliciting human preferences for tutoring, given the multiplicity of valid pedagogical moves and rater inconsistencies.

They identify a key unresolved design choice: whether preferences should come from learners, educators, or both, and how to combine them. Resolving this is essential for building robust reward models that capture pedagogical quality in multi-turn tutoring.

References

It is also not clear whether the preferences should be elicited from the learners, educators or both, and how they should be combined if it is the latter.

— Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach (2407.12687 - Jurenka et al., 21 May 2024) in Section: Challenges with eliciting human preferences for pedagogy (Appendix)

Source and Integration of Human Preferences for RLHF in Educational Tutoring

Sponsor

Background

References

Related Problems