2000 character limit reached
A Lemma Based Evaluator for Semitic Language Text Summarization Systems (1403.5596v1)
Published 22 Mar 2014 in cs.CL and cs.IR
Abstract: Matching texts in highly inflected languages such as Arabic by simple stemming strategy is unlikely to perform well. In this paper, we present a strategy for automatic text matching technique for for inflectional languages, using Arabic as the test case. The system is an extension of ROUGE test in which texts are matched on token's lemma level. The experimental results show an enhancement of detecting similarities between different sentences having same semantics but written in different lexical forms..