Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 84 tok/s

Gemini 2.5 Pro 45 tok/s Pro

GPT-5 Medium 28 tok/s Pro

GPT-5 High 21 tok/s Pro

GPT-4o 92 tok/s Pro

GPT OSS 120B 425 tok/s Pro

Kimi K2 157 tok/s Pro

2000 character limit reached

A Differential Index Measuring Rater's Capability in Educational Assessment (2502.09099v1)

Published 13 Feb 2025 in stat.AP

Abstract: A rater's ability to assign accurate scores can significantly impact the outcomes of educational assessments. However, common indices for evaluating rater characteristics typically focus on either their severity or their discrimination ability (i.e., skills to differentiate between students). Additionally, these indices are often developed without considering the rater's accuracy in scoring students at different ability levels. To address the limitations, this study proposes a single-value measure to assess a rater's capability of assigning accurate scores to students with varying ability levels. The measure is derived from the partial derivatives of each rater's passing rate concerning student ability. Mathematical derivations of the index under generalized multi-facet models and hierarchical rater models are provided. To ease the implementation of the index, this study develops parameter estimation using marginal likelihood and its Laplacian approximation which allows for efficient evaluation and processing of large datasets involving numerous students and raters. Simulation studies demonstrate the accuracy of parameter recovery using the approximate likelihood and show how the capability indices vary with different levels of rater severity. An empirical study further tests the practical applicability of the new measure, where raters evaluate essays on four topics: "family," "school," "sport," and "work." Results show that raters are most capable when rating the topic of family and least capable when rating sport, with individual raters displaying different capabilities across the various topics.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

A Differential Index Measuring Rater's Capability in Educational Assessment (2502.09099v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (3)