For the treatment of large-scale incomplete data, parallel fractional hot-deck imputation and variance estimation PROJECT TITLE : Parallel Fractional Hot-Deck Imputation and Variance Estimation for Big Incomplete Data Curing ABSTRACT: The fractional hot-deck imputation, also known as FHDI, is a method for handling multivariate missing data that is assumption-free and serves as a general-purpose imputation technique. This method fills in each missing item with multiple observed values rather than resorting to artificially created values. The corresponding R package, FHDI J. Im, I. Cho, and J. K. Kim, "An R package for fractional hot deck imputation," R J., vol. 10, no. 1, pp. 140–154, 2018 possesses generality and efficiency; however, due to the requirement of excessive memory and a lengthy running time, it is not suitable for dealing with large amounts of incomplete data. We developed a new version of a parallel fractional hot-deck imputation program (named as P-FHDI), which is suitable for cleaning up large incomplete datasets, as a first step toward addressing large amounts of incomplete data by utilizing the FHDI. This program will be used to leverage the FHDI. When the P-FHDI was applied to large datasets containing up to millions of instances or 10,000 variables, the results demonstrated a speedup that was to the users' advantage. This paper explains the detailed parallel algorithms of the P-FHDI for large instances (big- n ) or high-dimensionality (big- p ) datasets and confirms the favorable scalability of the proposed approach. The proposed program takes all of the benefits of the serial FHDI and adds the ability to estimate variance in parallel, which will be of use to a wide variety of people working in the fields of science and engineering. Did you like this research project? To get this research project Guidelines, Training and Code... Click Here facebook twitter google+ linkedin stumble pinterest Concepts and Algorithms for Periodic Communities Mining in Temporal Networks For a Complex Metro System, Online Spatio-Temporal Crowd Flow Distribution Prediction