Joint MapReduce Scheduling and Network Policy Optimization in Hierarchical Data Centers PROJECT TITLE : Joint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Data Centers ABSTRACT: The use of mapreduce frameworks to analyze ever-increasing volumes of data is expected to continue increasing as large-scale data analysis becomes standard practice in a variety of industries. The intention to move MapReduce into multi-tenant clouds is increasing as a direct result of this trend. On the other hand, the time-varying network bandwidth in a shared cluster can have a significant impact on the performance of the MapReduce application. Even though many recent studies have shown that dynamic scheduling can improve MapReduce performance by cutting down on shuffle traffic, the majority of these studies fail to take into account the influence that hierarchical network architectures already prevalent in data centers can have. We propose and design a hierarchical topology (Hit) aware MapReduce scheduler in this article with the goal of minimizing overall data traffic cost and, as a result, reducing the amount of time needed to complete jobs. We begin by posing the issue as an optimization problem for topology-aware assignment (TAA), taking into account the dynamic computing and Communication resources offered by the cloud along with its hierarchical network architecture. We go on to develop a synergistic strategy to solve the TAA problem by utilizing the stable matching theory, which ensures the preference of both individual tasks and hosting machines. This allows us to solve the problem in a more efficient manner. In the end, we evaluate the performance of the proposed scheduler by conducting testbed experiments and simulations on Hadoop YARN, where it has been implemented as a pluggable module. The results of the testbed experiments indicate that using Hit-scheduler rather than Capacity Scheduler or Probabilistic Network-Aware scheduler can reduce the amount of time needed to finish a job by 28 and 11 percent, respectively. In addition, the results of our simulations show that Hit-scheduler has the potential to cut the traffic cost by as much as 38 percent and the average shuffle flow traffic time by 32 percent in comparison to capacity scheduler. We have extended Hit-scheduler to a decentralized heuristic scheme in this article so that it can perform policy-aware allocation in data center environments. Many of the currently available centralized approximation methods cannot be implemented within a data center because of their excessive complexity and infeasibility. Data centers typically contain a large number of servers, containers, switches, and traffic flows. In the extension, we have designed a decentralized heuristic scheme to perform the Policy-Aware Task (PAT) allocation by making use of an already existing centralized algorithm. Our goal was to approximately maximize the total gained utility with this approach. In conclusion, the results of the simulation-based experiments show that the proposed PAT policy reduces the Communication cost in data centers by 33.6% when compared with the default scheduler. Did you like this research project? To get this research project Guidelines, Training and Code... Click Here facebook twitter google+ linkedin stumble pinterest Fine-grained Access Control for Healthcare Internet-of-Things that is both lightweight and expressive Cloud-Assisted Edge Computing: Joint Computation Offloading and Bandwidth Assignment