PROJECT TITLE :
aHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters - 2017
In this paper, we have a tendency to propose an erasure-coded data archival system known as aHDFS for Hadoop clusters, where RS(k + r; k) codes are utilized to archive data replicas in the Hadoop distributed file system or HDFS. We have a tendency to develop two archival strategies (i.e., aHDFS-Grouping and aHDFS-Pipeline) in aHDFSto speed up the information archival process. aHDFS-Groupinga MapReduce-primarily based data archiving scheme - keeps every mapper's intermediate output Key-Value pairs in a very native key-value store. With the native store in place, aHDFS-Grouping merges all the intermediate key-worth pairs with the identical key into one single key-price try, followed by shuffling the single Key-Value pair to reducers to get final parity blocks. aHDFS-Pipeline forms a knowledge archival pipeline using multiple information node in a Hadoop cluster. aHDFS-Pipeline delivers the merged single key-worth combine to a subsequent node's local key-price store. Last node within the pipeline is accountable for outputting parity blocks. We tend to implement aHDFS in a very real-world Hadoop cluster. The experimental results show that aHDFS-Grouping and aHDFS-Pipeline speed up Baseline's shuffle and cut back phases by a issue of ten and five, respectively. When block size is larger than 32 MB, aHDFS improves the performance of HDFS-RAID and HDFS-EC by approximately 31.eight and fifteen.seven p.c, respectively.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here