Oddballness: universal anomaly detection with language models

Published 4 Sep 2024 in cs.CL | (2409.03046v1)

Abstract: We present a new method to detect anomalies in texts (in general: in sequences of any data), using LLMs, in a totally unsupervised manner. The method considers probabilities (likelihoods) generated by a LLM, but instead of focusing on low-likelihood tokens, it considers a new metric introduced in this paper: oddballness. Oddballness measures how ``strange'' a given token is according to the LLM. We demonstrate in grammatical error detection tasks (a specific case of text anomaly detection) that oddballness is better than just considering low-likelihood events, if a totally unsupervised setup is assumed.