PROJECT TITLE :
Inference of Regular Expressions for Text Extraction from Examples
A large category of entity extraction tasks from text that is either semistructured or fully unstructured could be addressed by regular expressions, because in many practical cases the relevant entities follow an underlying syntactical pattern and this pattern might be described by a regular expression. During this work, we tend to think about the long-standing problem of synthesizing such expressions automatically, based solely on samples of the required behavior. We tend to present the planning and implementation of a system capable of addressing extraction tasks of realistic complexity. Our system is predicated on an evolutionary procedure fastidiously tailored to the precise desires of normal expression generation by examples. The procedure executes a research driven by a multiobjective optimization strategy aimed toward simultaneously improving multiple performance indexes of candidate solutions whereas at the identical time making certain an adequate exploration of the large resolution house. We tend to assess our proposal experimentally in great depth, on a number of difficult datasets. The accuracy of the obtained solutions seems to be adequate for practical usage and improves over earlier proposals significantly. Most importantly, our results are highly competitive even with respect to human operators. A prototype is offered as a net application at <;uri xlink:type="easy">http://regex.inginf.units.it<;/uri>.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here