2000 character limit reached
Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure (1611.03641v2)
Published 11 Nov 2016 in cs.CL
Abstract: We suggest a new method for creating and using gold-standard datasets for word similarity evaluation. Our goal is to improve the reliability of the evaluation, and we do this by redesigning the annotation task to achieve higher inter-rater agreement, and by defining a performance measure which takes the reliability of each annotation decision in the dataset into account.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.