Text categorization is the task of assigning predefined categories to natural language text. With the widely used 'bag of words' illustration, previous researches typically assign a word with values such that whether this word seems in the document involved or how frequently this word appears. Although these values are helpful for text categorization, they have not fully expressed the abundant data contained in the document. This paper explores the effect of alternative varieties of values, which specific the distribution of a word in the document. These novel values assigned to a word are called distributional options, which embody the compactness of the appearances of the word and also the position of the primary appearance of the word. The proposed distributional features are exploited by a tf idf vogue equation and totally different features are combined using ensemble learning techniques. Experiments show that the distributional options are helpful for text categorization. In contrast to using the traditional term frequency values solely, together with the distributional options requires only a little extra price, whereas the categorization performance can be significantly improved. Further analysis shows that the distributional options are especially useful when documents are long and also the writing vogue is casual.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here