Four Years in Review: Statistical Practices of Likert Scales in Human-Robot Interaction Studies (2001.03231v2)

Published 9 Jan 2020 in cs.HC, cs.RO, stat.AP, and stat.ME

Abstract: As robots become more prevalent, the importance of the field of human-robot interaction (HRI) grows accordingly. As such, we should endeavor to employ the best statistical practices. Likert scales are commonly used metrics in HRI to measure perceptions and attitudes. Due to misinformation or honest mistakes, most HRI researchers do not adopt best practices when analyzing Likert data. We conduct a review of psychometric literature to determine the current standard for Likert scale design and analysis. Next, we conduct a survey of four years of the International Conference on Human-Robot Interaction (2016 through 2019) and report on incorrect statistical practices and design of Likert scales. During these years, only 3 of the 110 papers applied proper statistical testing to correctly-designed Likert scales. Our analysis suggests there are areas for meaningful improvement in the design and testing of Likert scales. Lastly, we provide recommendations to improve the accuracy of conclusions drawn from Likert data.

Authors (4)

Mariah L. Schrum (6 papers)
Michael Johnson (16 papers)
Muyleng Ghuy (3 papers)
Matthew C. Gombolay (8 papers)

Citations (61)

View on Semantic Scholar

Summary

The paper reviewed HRI studies from 2016-2019, identifying prevalent issues with the use of Likert scales, including terminology, design, and statistical methods.
Common design flaws often involved using insufficient items for multi-item scales and not checking validity or reliability metrics like Cronbach's alpha.
Statistical errors included inappropriately applying parametric tests to ordinal data and neglecting necessary post-hoc corrections or assumption checks.

The paper "Four Years in Review: Statistical Practices of Likert Scales in Human-Robot Interaction Studies" provides an extensive evaluation of the use of Likert scales within the field of Human-Robot Interaction (HRI). The authors conducted a comprehensive review of papers presented at the International Conference on Human-Robot Interaction from 2016 to 2019, with a focus on identifying and categorizing statistical practices related to Likert scales.

Key Insights and Contributions

Misuse of the Term "Likert Scale":
- The paper identifies a prevalent misuse of the term "Likert scale," distinguishing it from Likert items and response formats. A correct Likert scale refers to a composite score derived from related items measuring various aspects of a single attribute.
Common Design Issues:
- A significant portion of the reviewed literature incorrectly used Likert items, either by employing scales with too few items or failing to verify internal consistency and validity metrics such as Cronbach's alpha. The authors emphasize that multi-item scales should ideally have at least four items to measure complex constructs accurately.
Statistical Testing Concerns:
- The paper reports widespread misuse of statistical tests, particularly the inappropriate application of parametric tests to individual Likert items. Given the ordinal nature of these items, the authors recommend the use of non-parametric tests unless the data explicitly meet interval assumptions.
- Many papers fail to perform requisite post-hoc corrections when conducting multiple comparisons, increasing the risk of Type I errors. Additionally, a lack of verification for assumptions related to parametric tests, such as normality and homoscedasticity, is highlighted.
Recommendations:
- The authors provide a series of recommendations to improve the integrity of Likert data usage in HRI research. These include using multi-item scales with validated constructs, ensuring proper post-hoc corrections when conducting multiple tests, and systematically checking assumptions of statistical tests.
Call for Better Practices:
- The paper concludes by urging the HRI community to adopt rigorous practices to ensure the reliability and validity of inferences made from Likert data. It stresses the importance of thorough reporting and validation when designing and analyzing Likert scales to uphold scientific rigor.
Research Methodology:
- The paper provides a meticulous review of 110 papers from the specified conference proceedings, employing keywords like "Likert," "questionnaire," and "scale" to identify relevant studies. The analysis categorizes papers based on misnomers, improper design, and statistical misuse.

Overall, the paper offers a critical examination of Likert scale usage in HRI research, highlighting areas in need of improvement, and provides concrete recommendations for enhancing the quality of statistical analysis in the field.

PDF Markdown

Four Years in Review: Statistical Practices of Likert Scales in Human-Robot Interaction Studies (2001.03231v2)

Summary

Key Insights and Contributions

Related Papers