K-NS: Section-Based Outlier Detection in High Dimensional Space

Published 5 May 2014 in cs.AI, cs.LG, and stat.ML | (1405.1027v1)

Abstract: Finding rare information hidden in a huge amount of data from the Internet is a necessary but complex issue. Many researchers have studied this issue and have found effective methods to detect anomaly data in low dimensional space. However, as the dimension increases, most of these existing methods perform poorly in detecting outliers because of "high dimensional curse". Even though some approaches aim to solve this problem in high dimensional space, they can only detect some anomaly data appearing in low dimensional space and cannot detect all of anomaly data which appear differently in high dimensional space. To cope with this problem, we propose a new k-nearest section-based method (k-NS) in a section-based space. Our proposed approach not only detects outliers in low dimensional space with section-density ratio but also detects outliers in high dimensional space with the ratio of k-nearest section against average value. After taking a series of experiments with the dimension from 10 to 10000, the experiment results show that our proposed method achieves 100% precision and 100% recall result in the case of extremely high dimensional space, and better improvement in low dimensional space compared to our previously proposed method.