Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Using Fisher's Exact Test to Evaluate Association Measures for N-grams (2104.14209v1)

Published 29 Apr 2021 in cs.CL

Abstract: To determine whether some often-used lexical association measures assign high scores to n-grams that chance could have produced as frequently as observed, we used an extension of Fisher's exact test to sequences longer than two words to analyse a corpus of four million words. The results, based on the precision-recall curve and a new index called chance-corrected average precision, show that, as expected, simple-ll is extremely effective. They also show, however, that MI3 is more efficient than the other hypothesis tests-based measures and even reaches a performance level almost equal to simple-ll for 3-grams. It is additionally observed that some measures are more efficient for 3-grams than for 2-grams, while others stagnate.

Summary

We haven't generated a summary for this paper yet.