On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications - 2015
The MapReduce programming model simplifies massive-scale information processing on commodity cluster by exploiting parallel map tasks and scale back tasks. Although several efforts have been created to boost the performance of MapReduce jobs, they ignore the network traffic generated in the shuffle part, which plays a critical role in performance enhancement. Traditionally, a hash operate is used to partition intermediate data among cut back tasks, which, however, isn't traffic-efficient as a result of network topology and knowledge size associated with every key aren't considered. During this paper, we study to reduce network traffic value for a MapReduce job by designing a completely unique intermediate data partition theme. Furthermore, we have a tendency to jointly contemplate the aggregator placement drawback, where each aggregator can cut back merged traffic from multiple map tasks. A decomposition-primarily based distributed algorithm is proposed to accommodate the massive-scale optimization problem for large data application and an on-line algorithm is also designed to regulate knowledge partition and aggregation in an exceedingly dynamic manner. Finally, extensive simulation results demonstrate that our proposals can significantly cut back network traffic cost below each offline and on-line cases.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here