Efficient Skew Handling for Outer Joins in a Cloud Computing Environment - 2018


Outer joins are ubiquitous in several workloads and Big Data systems. The question of a way to best execute outer joins in large parallel systems is notably difficult, as universe datasets are characterized by data skew resulting in performance issues. Although skew handling techniques have been extensively studied for inner joins, there is little printed work solving the corresponding problem for parallel outer joins, particularly within the extremely popular Cloud Computing surroundings. Standard approaches to the problem like ones based on hash redistribution typically lead to load balancing issues whereas duplication-based approaches incur significant overhead in terms of network Communication. In this Project, we propose a brand new approach for economical skew handling in outer joins over a Cloud Computing atmosphere. We present an efficient implementation of our approach over the Spark framework. We tend to evaluate the performance of our approach on a 192-core system with giant take a look at datasets in excess of one hundred GB and with varying skew. Experimental results show that our approach is scalable and, a minimum of in cases of high skew, considerably faster than the state-of-the-art.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here

PROJECT TITLE : TARA: An Efficient Random Access Mechanism for NB-IoT by Exploiting TA Value Difference in Collided Preambles ABSTRACT: The 3rd Generation Partnership Project (3GPP) has specified the narrowband Internet of Things
PROJECT TITLE : ESVSSE Enabling Efficient, Secure, Verifiable Searchable Symmetric Encryption ABSTRACT: It is believed that symmetric searchable encryption, also known as SSE, will solve the problem of privacy in data outsourcing
PROJECT TITLE : ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering ABSTRACT: A wide variety of big data applications generate an enormous amount of streaming data that is high-dimensional, real-time, and constantly
PROJECT TITLE : Efficient Shapelet Discovery for Time Series Classification ABSTRACT: Recently, it was discovered that time-series shapelets, which are discriminative subsequences, are effective for the classification of time
PROJECT TITLE : Efficient Identity-based Provable Multi-Copy Data Possession in Multi-Cloud Storage ABSTRACT: A significant number of clients currently store multiple copies of their data on a variety of cloud servers. This helps

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry