Joint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Data Centers


The use of mapreduce frameworks to analyze ever-increasing volumes of data is expected to continue increasing as large-scale data analysis becomes standard practice in a variety of industries. The intention to move MapReduce into multi-tenant clouds is increasing as a direct result of this trend. On the other hand, the time-varying network bandwidth in a shared cluster can have a significant impact on the performance of the MapReduce application. Even though many recent studies have shown that dynamic scheduling can improve MapReduce performance by cutting down on shuffle traffic, the majority of these studies fail to take into account the influence that hierarchical network architectures already prevalent in data centers can have. We propose and design a hierarchical topology (Hit) aware MapReduce scheduler in this article with the goal of minimizing overall data traffic cost and, as a result, reducing the amount of time needed to complete jobs. We begin by posing the issue as an optimization problem for topology-aware assignment (TAA), taking into account the dynamic computing and Communication resources offered by the cloud along with its hierarchical network architecture. We go on to develop a synergistic strategy to solve the TAA problem by utilizing the stable matching theory, which ensures the preference of both individual tasks and hosting machines. This allows us to solve the problem in a more efficient manner. In the end, we evaluate the performance of the proposed scheduler by conducting testbed experiments and simulations on Hadoop YARN, where it has been implemented as a pluggable module. The results of the testbed experiments indicate that using Hit-scheduler rather than Capacity Scheduler or Probabilistic Network-Aware scheduler can reduce the amount of time needed to finish a job by 28 and 11 percent, respectively. In addition, the results of our simulations show that Hit-scheduler has the potential to cut the traffic cost by as much as 38 percent and the average shuffle flow traffic time by 32 percent in comparison to capacity scheduler. We have extended Hit-scheduler to a decentralized heuristic scheme in this article so that it can perform policy-aware allocation in data center environments. Many of the currently available centralized approximation methods cannot be implemented within a data center because of their excessive complexity and infeasibility. Data centers typically contain a large number of servers, containers, switches, and traffic flows. In the extension, we have designed a decentralized heuristic scheme to perform the Policy-Aware Task (PAT) allocation by making use of an already existing centralized algorithm. Our goal was to approximately maximize the total gained utility with this approach. In conclusion, the results of the simulation-based experiments show that the proposed PAT policy reduces the Communication cost in data centers by 33.6% when compared with the default scheduler.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here

PROJECT TITLE : Joint Transceiver Beamforming Design for Hybrid Full-Duplex and Half-Duplex Ad-Hoc Networks ABSTRACT: In this paper, we propose a joint transceiver beamforming design method for hybrid full-duplex (FD) and half-duplex
PROJECT TITLE : Joint Computation Offloading and Bandwidth Assignment in Cloud-Assisted Edge Computing ABSTRACT: The process of augmenting the computational capabilities of mobile devices with limited resources by offloading computation
PROJECT TITLE : Message-Passing-Based Joint User Association and Time Allocation for Wireless Powered Communication Networks ABSTRACT: A joint design of user association and time allocation for wirelessly powered communication
PROJECT TITLE : Joint detection and matching of feature points in multimodal images ABSTRACT: In this work, we propose a novel architecture for Convolutional Neural Networks (CNNs) for the joint detection and matching of feature
PROJECT TITLE : Data Representation by Joint Hypergraph Embedding and Sparse Coding ABSTRACT: Matrix factorization, also known as MF, is a well-known unsupervised learning technique for the representation of data. It has seen

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry