FiDoop Parallel Mining of Frequent Itemsets Using MapReduce - 2016
Existing parallel mining algorithms for frequent itemsets lack a mechanism that allows automatic parallelization, load balancing, data distribution, and fault tolerance on giant clusters. As a resolution to this downside, we design a parallel frequent itemsets mining algorithm known as FiDoop using the MapReduce programming model. To achieve compressed storage and avoid building conditional pattern bases, FiDoop incorporates the frequent items ultrametric tree, rather than standard FP trees. In FiDoop, three MapReduce jobs are implemented to finish the mining task. In the crucial third MapReduce job, the mappers independently decompose itemsets, the reducers perform combination operations by constructing little ultrametric trees, and the actual mining of those trees separately. We implement FiDoop on our in-house Hadoop cluster. We have a tendency to show that FiDoop on the cluster is sensitive to information distribution and dimensions, as a result of itemsets with completely different lengths have different decomposition and construction prices. To improve FiDoop's performance, we develop a workload balance metric to live load balance across the cluster's computing nodes. We develop FiDoop-HD, an extension of FiDoop, to hurry up the mining performance for prime-dimensional knowledge analysis. Extensive experiments using real-world celestial spectral information demonstrate that our proposed solution is economical and scalable.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here