PROJECT TITLE :
RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration
Hadoop is a widely-used implementation framework of the MapReduce programming model for massive-scale data processing. Hadoop performance but is considerably littered with the settings of the Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-consuming, if in any respect sensible. This paper proposes an approach, known as RFHOC, to automatically tune the Hadoop configuration parameters for optimized performance for a given application running on a given cluster. RFHOC constructs two ensembles of performance models using a random-forest approach for the map and reduce stage respectively. Leveraging these models, RFHOC employs a genetic algorithm to automatically search the Hadoop configuration space. The analysis of RFHOC using five typical Hadoop programs, every with five totally different input information sets, shows that it achieves a performance speedup by a factor of 2.eleven on average and up to seven.4 over the recently proposed cost-based optimization (CBO) approach. In addition, RFHOC's performance profit increases with input knowledge set size.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here