Tuned Models of Peer Assessment in MOOCs (1307.2579v1)

Published 9 Jul 2013 in cs.LG, cs.AI, cs.HC, stat.AP, and stat.ML

Abstract: In massive open online courses (MOOCs), peer grading serves as a critical tool for scaling the grading of complex, open-ended assignments to courses with tens or hundreds of thousands of students. But despite promising initial trials, it does not always deliver accurate results compared to human experts. In this paper, we develop algorithms for estimating and correcting for grader biases and reliabilities, showing significant improvement in peer grading accuracy on real data with 63,199 peer grades from Coursera's HCI course offerings --- the largest peer grading networks analysed to date. We relate grader biases and reliabilities to other student factors such as student engagement, performance as well as commenting style. We also show that our model can lead to more intelligent assignment of graders to gradees.

Authors (6)

Chris Piech (33 papers)
Jonathan Huang (46 papers)
Zhenghao Chen (30 papers)
Chuong Do (2 papers)
Andrew Ng (21 papers)
Daphne Koller (40 papers)

Citations (397)

View on Semantic Scholar

Summary

Detailed Analysis of "Tuned Models of Peer Assessment in MOOCs"

The paper, entitled "Tuned Models of Peer Assessment in MOOCs," presents statistically enhanced methods for improving the accuracy and scalability of peer grading in Massive Open Online Courses (MOOCs). This research is pivotal as MOOCs often enroll a large number of students, creating the necessity for automated and scalable grading systems to manage complex and open-ended assignments.

Core Contributions

The central contribution of this paper lies in the development of probabilistic models that mitigate grader biases and enhance reliability in peer assessments. The authors analyze 63,199 peer grades from Coursera's Human-Computer Interaction (HCI) courses, identified as the largest network studied to date. They have utilized this substantial dataset for modeling and improving peer assessment mechanisms within MOOCs, specifically focusing on grader biases and reliabilities.

The authors compare several models with increasing complexity:

Grader Bias and Reliability Model: This model introduces graders' biases and reliability by putting priors over the latent variables, allowing for variability in grading tendencies among students.
Temporal Coherence Model: It integrates a longitudinal aspect of grader biases across multiple assignments, demonstrating moderate correlation and providing temporal continuity in modeling.
Coupled Grader Score and Reliability Model: This model innovatively couples submission scores with grading ability, hypothesizing that a student's performance on an assignment may correlate with their grading reliability.

The aforementioned models substantially increase grading accuracy. Notably, by addressing grader biases alone, there's a reported improvement of over 30% in Root Mean Squared Error (RMSE) when compared to the baseline median approach employed by Coursera.

Implications of Findings

The implications of this paper extend beyond mere algorithmic computation. By understanding the educational landscape, the research also reveals the potential of integrating social and engagement metrics into the grading algorithm. Grader attributes like bias and grading efficiency offer a new dimension to evaluating student engagement and participation, particularly in distance learning platforms.

Moreover, the paper explores the necessity of fair workload distribution among graders, suggesting dynamic allocation mechanisms where the level of confidence in a student's grade guides the reassignment of graders. This could lead to more balanced and fair grading, reinforcing the necessity for equity in peer assessment systems.

Future Directions

The work lays a foundation for further exploration in several areas:

Grader Assignment Mechanics: Improving grader-optimize recommendations based on the grader-gradee compatibility measures can be an area for development, potentially utilizing natural language processing techniques to infer compatibility.
Ethical Considerations: There's a need for further exploration into the ethical deployment of automated grading systems, specifically how transparent and interpretable these systems need to be for the students.
Incentive Structures: Investigating incentives for graders to provide high-quality assessments, supported by insights from game theory, could further reinforce the utility of such models.

Conclusion

This paper provides significant advancements in the domain of peer grading in MOOCs, driven by rigorous data and sophisticated probabilistic models. By successfully addressing the challenges of grading scalability and fairness, the research presents viable strategies for enhancing the quality of educational experiences on massive online platforms. The paper not only improves current methodologies but also offers insights into student engagement, thereby adding value in both theoretical and practical dimensions of educational data science.

PDF Markdown

Related Papers

Find Related Papers