Speed Up Big Data Analytics by Unveiling the Storage Distribution of Sub-Datasets - 2018


During this Project, we have a tendency to study the matter of sub-dataset analysis over distributed file systems, e.g., the Hadoop file system. Our experiments show that the sub-datasets distribution over HDFS blocks, that is hidden by HDFS, will typically cause corresponding analyses to suffer from a seriously imbalanced or inefficient parallel execution. Specifically, the content clustering of sub-datasets results in some computational nodes carrying out much more workload than others; furthermore, it results in inefficient sampling of sub-datasets, as analysis programs can typically browse massive amounts of irrelevant data. We have a tendency to conduct a comprehensive analysis on how imbalanced computing patterns and inefficient sampling occur. We have a tendency to then propose a storage distribution aware technique to optimize sub-dataset analysis over distributed storage systems referred to as DataNet. First, we tend to propose an economical algorithm to get the meta-knowledge of sub-dataset distributions. Second, we tend to design an elastic storage structure called ElasticMap based mostly on the HashMap and BloomFilter techniques to store the meta-information. Third, we have a tendency to employ distribution-aware algorithms for sub-dataset applications to attain balanced and economical parallel execution. Our proposed method can profit completely different sub-dataset analyses with varied computational necessities. Experiments are conducted on PRObEs Marmot 128-node cluster testbed and also the results show the performance edges of DataNet.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here

PROJECT TITLE : Speed and Position Aware Dynamic Routing for Emergency Message Dissemination in VANETs ABSTRACT: By exchanging pre-planned Emergency Messages (EMs) between moving vehicles, Vehicular Ad hoc Networks, also known
PROJECT TITLE : Long-Term Urban Traffic Speed Prediction With Deep Learning on Graphs ABSTRACT: The ability to predict the speed of traffic is one of the fundamentals of advanced traffic management, and the gradual deployment
PROJECT TITLE : GraphSAGE-Based Traffic Speed Forecasting for Segment Network With Sparse Data ABSTRACT: The ability to accurately anticipate the flow of traffic is an essential component of intelligent traffic management systems.
PROJECT TITLE : A Multitask Learning Model for Traffic Flow and Speed Forecasting ABSTRACT: Accurate short-term traffic state forecasting is beneficial to Intelligent Transportation Systems (ITS) research and applications. This
PROJECT TITLE : Accelerating GMM-Based Patch Priors for Image Restoration Three Ingredients for a 100_ Speed-Up ABSTRACT: The goal of picture restoration is to restore a clear image from a smudged one. In order to restore natural

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry