PROJECT TITLE :
Named entity recognition from unstructured handwritten document images - 2016
ABSTRACT:
Named entity recognition is an important topic in the sphere of natural language processing, whereas in document Image Processing, such recognition is sort of challenging while not employing any linguistic information. During this paper we propose an approach to detect named entities (NEs) directly from offline handwritten unstructured document pictures while not explicit character/word recognition, and with terribly very little aid from natural language and script rules. At the preprocessing stage, the document image is binarized, and then the text is segmented into words. The slant/skew/baseline corrections of the words also are performed. Once preprocessing, the words are sent for NE recognition. We tend to analyze the structural and positional characteristics of NEs and extract some relevant features from the word image. Then the BLSTM neural network is employed for NE recognition. Our system additionally contains a post-processing stage to cut back the true NE rejection rate. The proposed approach produces encouraging results on both historical and modern document images, including those from an Australian archive, that are reported here for the very 1st time.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here