Comparison of Estimation Methods for Missing Value Imputation of Gene Expression Data

Sarikas A., Odabasioglu N., ALTAY G.

Medical Technologies National Conference (TIPTEKNO), Antalya, Turkey, 27 - 29 October 2016 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume:
  • City: Antalya
  • Country: Turkey


Control and correction process of missing values (imputation of MVs) is the first stage of the preprocessing of microarray datasets. This paper focuses on a comparison of most reliable and up to date estimation methods to control and correct the missing values. Imputation of MVs has a very high priority because of its impact on next pre-processing and post-processing stages of microarray data analysis namely, quality control, normalization, differential gene expression, classification, clustering, and pathway analysis, etc. Normalized root mean square error (NRMSE) value is used to evaluate the performances of most popular five methods (k-nearest neighbors, Bayesian principal component analysis, local least squares, mean and median). When NRMSE values of methods were compared, it has observed that local least squares (LLS) and Bayesian principal component analysis (BPCA) methods outperformed all other methods in all percentages of MVs (1%, 5%, 10%, and 20%). BPCA method has given the best results in all percentages of MVs over the number of probes or genes, whereas LLS method has given the best results in all percentages of MVs over the number of samples. The advantage of these two methods over others is that they are least affected by the complexity of the data set.