2000 character limit reached
Appendix - Recommended Statistical Significance Tests for NLP Tasks (1809.01448v1)
Published 5 Sep 2018 in cs.CL
Abstract: Statistical significance testing plays an important role when drawing conclusions from experimental results in NLP papers. Particularly, it is a valuable tool when one would like to establish the superiority of one algorithm over another. This appendix complements the guide for testing statistical significance in NLP presented in \cite{dror2018hitchhiker} by proposing valid statistical tests for the common tasks and evaluation measures in the field.