PROJECT TITLE :
Optimizing big data processing performance in the public cloud: opportunities and approaches
Nowadays's lightning fast data generation from large sources is looking for efficient huge information processing, that imposes unprecedented demands on the computing and networking infrastructures. State-of-the-art tools, most notably MapReduce, are usually performed on dedicated server clusters to explore knowledge parallelism. For grass roots users or non-computing professionals, the cost of deploying and maintaining a large-scale dedicated server clusters will be prohibitively high, not to say the technical skills concerned. On the other hand, public clouds allow general users to rent virtual machines and run their applications in an exceedingly pay-as-you-go manner with ultra-high scalability with minimal upfront prices. This new computing paradigm has gained tremendous success in recent times, changing into a highly attractive alternative to dedicated server clusters. This text discusses the essential challenges and opportunities when big knowledge meet the general public cloud. We identify the key variations between running massive data processing in an exceedingly public cloud and in dedicated server clusters. We then present 2 vital issues for efficient massive information processing in the public cloud, resource provisioning (i.e., a way to rent VMs) and VM-MapReduce job/task scheduling (i.e., how to run MapReduce once the VMs are created). Each of those 2 questions have a set of problems to resolve. We present solution approaches for bound issues, and provide optimized design tips for others. Finally, we discuss our implementation experiences.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here