Feature clustering could be a powerful methodology to scale back the dimensionality of feature vectors for text classification. In this paper, we propose a fuzzy similarity-primarily based self-constructing algorithm for feature clustering. The words in the feature vector of a document set are grouped into clusters, primarily based on similarity check. Words that are similar to each alternative are grouped into the identical cluster. Each cluster is characterised by a membership function with statistical mean and deviation. When all the words are fed in, a desired variety of clusters are shaped automatically. We then have one extracted feature for every cluster. The extracted feature, akin to a cluster, is a weighted combination of the words contained within the cluster. By this algorithm, the derived membership functions match closely with and describe properly the important distribution of the training information. Besides, the user would like not specify the amount of extracted features ahead, and trial-and-error for determining the suitable variety of extracted features will then be avoided. Experimental results show that our method can run faster and get higher extracted options than other methods.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here