Papers
Topics
Authors
Recent
Search
2000 character limit reached

The Impact of Discretization Method on the Detection of Six Types of Anomalies in Datasets

Published 27 Aug 2020 in cs.DB, cs.AI, cs.LG, and stat.ML | (2008.12330v1)

Abstract: Anomaly detection is the process of identifying cases, or groups of cases, that are in some way unusual and do not fit the general patterns present in the dataset. Numerous algorithms use discretization of numerical data in their detection processes. This study investigates the effect of the discretization method on the unsupervised detection of each of the six anomaly types acknowledged in a recent typology of data anomalies. To this end, experiments are conducted with various datasets and SECODA, a general-purpose algorithm for unsupervised non-parametric anomaly detection in datasets with numerical and categorical attributes. This algorithm employs discretization of continuous attributes, exponentially increasing weights and discretization cut points, and a pruning heuristic to detect anomalies with an optimal number of iterations. The results demonstrate that standard SECODA can detect all six types, but that different discretization methods favor the discovery of certain anomaly types. The main findings also hold for other detection techniques using discretization.

Citations (5)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.