Improving Quality of Clustering using Cellular Automata for Information retrieval

Published 13 Jan 2014 in cs.IR | (1401.2684v1)

Abstract: Clustering has been widely applied to Information Retrieval (IR) on the grounds of its potential improved effectiveness over inverted file search. Clustering is a mostly unsupervised procedure and the majority of the clustering algorithms depend on certain assumptions in order to define the subgroups present in a data set .A clustering quality measure is a function that, given a data set and its partition into clusters, returns a non-negative real number representing the quality of that clustering. Moreover, they may behave in a different way depending on the features of the data set and their input parameters values. Therefore, in most applications the resulting clustering scheme requires some sort of evaluation as regards its validity. The quality of clustering can be enhanced by using a Cellular Automata Classifier for information retrieval. In this study we take the view that if cellular automata with clustering is applied to search results (query-specific clustering), then it has the potential to increase the retrieval effectiveness compared both to that of static clustering and of conventional inverted file search. We conducted a number of experiments using ten document collections and eight hierarchic clustering methods. Our results show that the effectiveness of query-specific clustering with cellular automata is indeed higher and suggest that there is scope for its application to IR.