Gene expression data generated from microarray technology is often analyzed for disease diagnostics and treatment. However, this data suffers with missing values that may lead to inaccurate findings. Since data capture is expensive, time consuming, and is required to be collected from subjects, it is worthwhile to recover missing values instead of re-collecting the data. In this paper, a novel but simple method, namely, DSNN (Doubly Sparse DCT domain with Nuclear Norm minimization) has been proposed for imputing missing values in microarray data. Extensive experiments including pathway enrichment have been carried out on four blood cancer dataset to validate the method as well as to establish the significance of imputation. A new method, namely, DSNN, was proposed for missing value imputation on gene expression data. Method was validated on four dataset, CLL, AML, MM (Spanish data), and MM (Indian data). All the dataset were downloaded from GEO repository. Missing values were introduced in the original data from 10 to 90% in steps of 10% because method validation requires ground truth. Quantitative results on normalized mean square error (NMSE) between the ground truth and imputed data were computed. To further validate and establish the significance of the proposed imputation method, two experiments were carried out on the data imputed with the proposed method, data imputed with the state-of-art methods, and data with missing values. In the first experiment, classification of normal vs. cancer subjects was carried out. In the second experiment, biological significance of imputation was ascertained by identifying top candidate tumor drivers using the existing state-of-the-art SPARROW algorithm, followed by gene list enrichment analysis on top candidate drivers. Quantitative NMSE results of the DSNN method were compared with three state-of-the-art imputation methods. DSNN method was observed to perform better compared to these other methods both at high as well as low observable data. Experiment-1 demonstrated superior results on classification with imputation compared to that performed on missing data matrix as well as compared to classification on imputed data with existing methods. In experiment-2, cancer affected pathways were discovered with higher significance in the data imputed with the proposed method compared to those discovered with the missing data matrix. Missing value problem in microarray data is a serious problem and can adversely influence downstream analysis. A novel method, namely, DSNN is proposed for missing value imputation. The method is validated quantitatively on the application of classification and biologically by performing pathway enrichment analysis.Copyright © 2020 Farswan, Gupta, Gupta and Kaur.
March 5, 2020
A Systematic Literature Review of Residual Symptoms and Unmet Need in Patients with Rheumatoid Arthritis.
July 6, 2020
The Euro Heart Survey and EURObservational Research Programme (EORP) in atrial fibrillation registries: contribution to epidemiology, clinical management and therapy of atrial fibrillation patients over the last 20 years.
June 22, 2020
- ASCO – Lung CancerASCO.20 Virtual Scientific Program, held May 29 - 31, brought professionals from all over the world together to hear the brightest minds in oncology present state-of-the-art treatment modalities and new therapies.
- AACR-2020The American Association for Cancer Research is the world's oldest and largest professional association related to cancer research.
- ACC 2020The American College of Cardiology decided to cancel ACC.20/WCC due to COVID-19, which was scheduled to take place March 28-30 in Chicago. However, ACC.20/WCC Virtual Meeting continues to release cutting edge science and practice changing updates for cardiovascular professionals on demand and free through June 2020.
- ASCO 2019The 2019 ASCO Annual Meeting, taking place May 31-June 4 in Chicago, will bring together more than 32,000 oncology professionals from across the globe. The theme of this year’s conference is Caring for Every Patient, Learning From Every Patient.