On Fault Tolerance for Distributed Iterative Dataflow Processing - 2017


Large-scale graph and machine learning analytics widely use distributed iterative processing. Typically, these analytics are a half of a comprehensive workflow, that includes knowledge preparation, model building, and model evaluation. General-purpose distributed dataflow frameworks execute all steps of such workflows holistically. This holistic read enables these systems to reason regarding and automatically optimize the whole pipeline. Here, graph and machine learning analytics are known to incur a long runtime since they require multiple passes over the information till convergence is reached. Thus, fault tolerance and a fast-recovery from any intermittent failure is important for efficient analysis. During this paper, we have a tendency to propose novel fault-tolerant mechanisms for graph and machine learning analytics that run on distributed dataflow systems. We tend to ask for to scale back checkpointing costs and shorten failure recovery times. For graph processing, rather than writing checkpoints that block downstream operators, our mechanism writes checkpoints in an unblocking manner that doesn't break pipelined tasks. In contrast to the traditional approach for unblocking checkpointing (e.g., that manage checkpoints independently for immutable datasets), we tend to inject the checkpoints of mutable datasets into the iterative dataflow itself. Hence, our mechanism is iteration-aware by design. This simplifies the system architecture and facilitates coordinating checkpoint creation throughout iterative graph processing. Moreover, we tend to are able to rapidly rebound, via confined recovery, by exploiting the actual fact that log files exist regionally on healthy nodes and managing to avoid a whole recomputation from scratch. Furthermore, we tend to propose duplicate recovery for machine learning algorithms, whereby we tend to use a broadcast variable that enables us to quickly recover without having to introduce any checkpoints. So as to judge our fault tolerance strategies, we have a tendency to conduct each a theoretical study and experimental analyses using Apache Flink and see that they outperform blocking checkpointing and complete recovery.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here

PROJECT TITLE : Multi-Switches Fault Diagnosis Based on Small Low Frequency Data for Voltage-Source Inverters of PMSM Drives ABSTRACT: Using small low-frequency data for inverter failure diagnosis of permanent magnet synchronous
PROJECT TITLE : Fast Fault Diagnosis Method for Hall Sensors in Brushless DC Motor Drives ABSTRACT: Because of their simplicity and low cost, brushless direct current motors with Hall sensors are frequently employed in a wide
PROJECT TITLE : Fault Current Estimation in Multi-Terminal HVdc Grids Considering MMC Control ABSTRACT: For multi-terminal HVdc protection systems, DC faults are crucial events, and knowing the critical fault time is essential
PROJECT TITLE : Bridge-Type Solid-State Fault Current Limiter Based on ACDC Reactor ABSTRACT: Based on a single series reactor, this study presents a novel bridge-type solid-state fault current limiter (BSSFCL). There are
PROJECT TITLE : Fault Detection and Protection of Induction Motors Using Sensors ABSTRACT: Because an induction motor (IM) is used extensively in industry as an actuator, its protection against probable faults, such as overvoltage,

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry