ABSTRACT:

Several systems that rely on consistent data to offer high-quality services, such as digital libraries and e-commerce brokers, may be affected by the existence of duplicates, quasi replicas, or near-duplicate entries in their repositories. Because of that, there have been significant investments from private and government organizations for developing methods for removing replicas from its data repositories. This is due to the fact that clean and replica-free repositories not only allow the retrieval of higher quality information but also lead to more concise data and to potential savings in computational time and resources to process this data. In this paper, we propose a genetic programming approach to record deduplication that combines several different pieces of evidence extracted from the data content to find a deduplication function that is able to identify whether two entries in a repository are replicas or not. As shown by our experiments, our approach outperforms an existing state-of-the-art method found in the literature. Moreover, the suggested functions are computationally less demanding since they use fewer evidence. In addition, our genetic programming approach is capable of automatically adapting these functions to a given fixed replica identification boundary, freeing the user from the burden of having to choose and tune this parameter.


Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here


PROJECT TITLE : Genetic Improvement of Software: a Comprehensive Survey - 2017 ABSTRACT: Genetic improvement uses automated search to find improved versions of existing software. We tend to gift a comprehensive survey of this
PROJECT TITLE : Lung cancer survival prediction from pathological images and Genetic data - an integration study - 2016 ABSTRACT: In this paper, we tend to have proposed a framework for lung cancer survival prediction by integrating
PROJECT TITLE :Improving Power System Static Security Margins by Means of a Real Coded Genetic AlgorithmABSTRACT:This paper introduces a brand new technique of removing thermal overloads and voltage limits in an electric power
PROJECT TITLE :A Dynamic Multiagent Genetic Algorithm for Gene Regulatory Network Reconstruction Based on Fuzzy Cognitive MapsABSTRACT:In order to reconstruct giant-scale gene regulatory networks (GRNs) with high accuracy, a robust
PROJECT TITLE :Modified Compact Genetic Algorithm for Thinned Array SynthesisABSTRACT:In this letter, a brand new optimization algorithm, the Changed compact Genetic Algorithm (M-cGA) is introduced and applied to the synthesis

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry