PROJECT TITLE :
Parallel Fractional Hot-Deck Imputation and Variance Estimation for Big Incomplete Data Curing
ABSTRACT:
The fractional hot-deck imputation, also known as FHDI, is a method for handling multivariate missing data that is assumption-free and serves as a general-purpose imputation technique. This method fills in each missing item with multiple observed values rather than resorting to artificially created values. The corresponding R package, FHDI J. Im, I. Cho, and J. K. Kim, "An R package for fractional hot deck imputation," R J., vol. 10, no. 1, pp. 140–154, 2018 possesses generality and efficiency; however, due to the requirement of excessive memory and a lengthy running time, it is not suitable for dealing with large amounts of incomplete data. We developed a new version of a parallel fractional hot-deck imputation program (named as P-FHDI), which is suitable for cleaning up large incomplete datasets, as a first step toward addressing large amounts of incomplete data by utilizing the FHDI. This program will be used to leverage the FHDI. When the P-FHDI was applied to large datasets containing up to millions of instances or 10,000 variables, the results demonstrated a speedup that was to the users' advantage. This paper explains the detailed parallel algorithms of the P-FHDI for large instances (big- n ) or high-dimensionality (big- p ) datasets and confirms the favorable scalability of the proposed approach. The proposed program takes all of the benefits of the serial FHDI and adds the ability to estimate variance in parallel, which will be of use to a wide variety of people working in the fields of science and engineering.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here