Crowdsourcing to Clean Uncertain Data: A General Model with Varying Accuracy Rates PROJECT TITLE : Cleaning Uncertain Data with Crowdsourcing - a General Model with Diverse Accuracy Rates ABSTRACT: The uncertainty of data has emerged as a significant challenge for database management systems as a result of the widespread presence of errors in a variety of applications. Probabilistic databases, which can be used to store uncertain data, and querying facilities, which can yield answers with confidence, are provided as a solution to the problem of dealing with uncertain data. However, when uncertainty spreads throughout a system, the results of a query or mining process may no longer be reliable. In this article, we make use of the power of crowdsourcing by developing a series of Human Intelligence Tasks, also known as HITs for short, in order to ask a large group of people to improve the quality of uncertain data. When answering the HITs, in particular, we take into account the fact that crowds are comprised of workers whose accuracy rates vary. We devise solutions with the goal of achieving the highest possible data quality while reducing the total number of HITs. There are two challenges associated with this non-trivial optimization, both of which contribute to the extremely high computational cost associated with choosing the best set of HITs. To begin, there is a possibility that a crowd will provide incorrect answers, albeit with varying probabilities. Second, the HITs that are decomposed from uncertain data frequently have strong correlations with one another. In this paper, we address these challenges by developing an efficient approximation algorithm as well as an effective heuristic solution, particularly for crowds with varying individual accuracy rates. We derive tight lower and upper bounds for effective filtering and estimation, which allows us to further improve the efficiency of the process. In order to accurately assess the efficacy of our solutions, we run exhaustive tests on a simulated crowd as well as on an actual crowdsourcing platform. Did you like this research project? To get this research project Guidelines, Training and Code... Click Here facebook twitter google+ linkedin stumble pinterest Consensus Multi-view Subspace Clustering in One Step KNN Classification Challenges