PROJECT TITLE :
Similar Subtree Search Using Extended Tree Inclusion
This paper considers the matter of identifying all locations of subtrees in a very giant tree or in a very massive collection of trees that are the same as a specified pattern tree, where all trees are assumed to be rooted and node-labeled. The tree edit distance may be a widely-used live of tree (dis-)similarity, but is NP-onerous to compute for unordered trees. To cope with this issue, we have a tendency to propose a brand new similarity live that extends the concept of unordered tree inclusion by taking the prices of insertion and substitution operations on the pattern tree into account, and present an algorithm for computing it. Our algorithm has the same time complexity as the first one for unordered tree inclusion, i.e., it runs in time, where and denote the pattern tree and therefore the text tree, respectively, when the most outdegree of is bounded by a constant. Our experimental analysis using artificial and real datasets confirms that the proposed algorithm is quick and scalable and terribly useful for bibliographic matching, that is a typical entity resolution drawback for tree-structured data. Furthermore, we extend our algorithm to conjointly permit a relentless range of deletion operations on whereas still running in time.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here