- The paper reviewed HRI studies from 2016-2019, identifying prevalent issues with the use of Likert scales, including terminology, design, and statistical methods.
- Common design flaws often involved using insufficient items for multi-item scales and not checking validity or reliability metrics like Cronbach's alpha.
- Statistical errors included inappropriately applying parametric tests to ordinal data and neglecting necessary post-hoc corrections or assumption checks.
The paper "Four Years in Review: Statistical Practices of Likert Scales in Human-Robot Interaction Studies" provides an extensive evaluation of the use of Likert scales within the field of Human-Robot Interaction (HRI). The authors conducted a comprehensive review of papers presented at the International Conference on Human-Robot Interaction from 2016 to 2019, with a focus on identifying and categorizing statistical practices related to Likert scales.
Key Insights and Contributions
- Misuse of the Term "Likert Scale":
- The paper identifies a prevalent misuse of the term "Likert scale," distinguishing it from Likert items and response formats. A correct Likert scale refers to a composite score derived from related items measuring various aspects of a single attribute.
- Common Design Issues:
- A significant portion of the reviewed literature incorrectly used Likert items, either by employing scales with too few items or failing to verify internal consistency and validity metrics such as Cronbach's alpha. The authors emphasize that multi-item scales should ideally have at least four items to measure complex constructs accurately.
- Statistical Testing Concerns:
- The paper reports widespread misuse of statistical tests, particularly the inappropriate application of parametric tests to individual Likert items. Given the ordinal nature of these items, the authors recommend the use of non-parametric tests unless the data explicitly meet interval assumptions.
- Many papers fail to perform requisite post-hoc corrections when conducting multiple comparisons, increasing the risk of Type I errors. Additionally, a lack of verification for assumptions related to parametric tests, such as normality and homoscedasticity, is highlighted.
- Recommendations:
- The authors provide a series of recommendations to improve the integrity of Likert data usage in HRI research. These include using multi-item scales with validated constructs, ensuring proper post-hoc corrections when conducting multiple tests, and systematically checking assumptions of statistical tests.
- Call for Better Practices:
- The paper concludes by urging the HRI community to adopt rigorous practices to ensure the reliability and validity of inferences made from Likert data. It stresses the importance of thorough reporting and validation when designing and analyzing Likert scales to uphold scientific rigor.
- Research Methodology:
- The paper provides a meticulous review of 110 papers from the specified conference proceedings, employing keywords like "Likert," "questionnaire," and "scale" to identify relevant studies. The analysis categorizes papers based on misnomers, improper design, and statistical misuse.
Overall, the paper offers a critical examination of Likert scale usage in HRI research, highlighting areas in need of improvement, and provides concrete recommendations for enhancing the quality of statistical analysis in the field.