Clustering Algorithm for Gujarati Language

Published 20 Jul 2013 in cs.CL | (1307.5393v1)

Abstract: Natural language processing area is still under research. But now a day it is on platform for worldwide researchers. Natural language processing includes analyzing the language based on its structure and then tagging of each word appropriately with its grammar base. Here we have 50,000 tagged words set and we try to cluster those Gujarati words based on proposed algorithm, we have defined our own algorithm for processing. Many clustering techniques are available Ex. Single linkage, complete, linkage,average linkage, Hear no of clusters to be formed are not known, so it is all depends on the type of data set provided . Clustering is preprocess for stemming . Stemming is the process where root is extracted from its word. Ex. cats= cat+S, meaning. Cat: Noun and plural form.