Understanding Practical Tradeoffs in HPC Checkpoint-Scheduling Policies - 2018


As the dimensions of High-Performance Computing (HPC) clusters continues to grow, their increasing failure rates and energy consumption levels are emerging as serious style concerns. Efficiently running systems at such giant scales critically relies on deploying effective, practical methods for fault tolerance while having a smart understanding of their respective performance and energy overheads. The most typically used fault tolerance technique is checkpoint/restart. Checkpoint scheduling policies, but, are historically optimized and analysed from one angle: application performance. In this work, we tend to provide an extensive analysis of the performance, energy and i/O prices related to a wide array of checkpointing policies. We have a tendency to contemplate practical deployment problems and show that simple formulas can be used to accurately estimate wasted work in an exceedingly system. We propose methods to optimize checkpoint scheduling for energy savings and evaluate the runtime-optimized and energy-optimized policies using simulations based on failure logs from ten production HPC clusters. Our results show ample space for achieving high quality energy/performance tradeoffs when using ways that exploit characteristics of world failures. We have a tendency to also analyze the impact of energy-optimized checkpointing on the storage subsystem and establish policies that are optimal for I/O savings.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here

PROJECT TITLE : Mining Online Discussion Data for Understanding Teachers' Reflective Thinking - 2017 ABSTRACT: Teachers’ online discussion text knowledge streamline their reflective thinking. With the growing scale of
PROJECT TITLE : Understanding the Relation Between the Performance and Reliability of NAND Flash/SCM Hybrid Solid-State Drive - 2016 ABSTRACT: A NAND flash memory/storage-class memory (SCM) hybrid solid-state drive (SSD) will
PROJECT TITLE :Understanding the Magnetic Polarizability TensorABSTRACT:The aim of this paper is to provide new insights into the properties of the rank two polarizability tensor proposed by Ledger and Lionheart for describing
PROJECT TITLE :Understanding the ageing aspects of natural ester based insulation liquid in power transformerABSTRACT:Cellulose based mostly insulation materials and mineral oils have widely been utilized in liquid crammed transformers.
PROJECT TITLE :Understanding opennessABSTRACT:Several, several years ago I gave a speak at the U.S. Federal Communications Commission. I keep in mind nothing of that day aside from a passing comment created by the speaker before

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry