Classifying subjects into clinically and biologically homogeneous subgroups will facilitate the understanding of disease pathophysiology and development of targeted prevention and intervention strategies. Traditionally, disease subtyping is based on clinical characteristics alone, but subtypes identified by such an approach may not conform exactly to the underlying biological mechanisms. Very few studies have integrated genomic profiles (e.g., those from GWASs) with clinical symptoms for disease subtyping. Here we proposed an analytic framework capable of finding complex diseases subgroups by leveraging both GWAS-predicted gene expression levels and clinical data by a multi-view bicluster analysis. This approach connects SNPs to genes via their effects on expression, so the analysis is more biologically relevant and interpretable than a pure SNP-based analysis. Transcriptome of different tissues can also be readily modeled. We also proposed various evaluation metrics for assessing clustering performance. Our framework was able to subtype schizophrenia subjects into diverse subgroups with different prognosis and treatment response. We also applied the framework to the Northern Finland Birth Cohort (NFBC) 1966 dataset and identified high and low cardiometabolic risk subgroups in a gender-stratified analysis. The prediction strength by cross-validation was generally greater than 80%, suggesting good stability of the clustering model. Our results suggest a more data-driven and biologically informed approach to defining metabolic syndrome and subtyping psychiatric disorders. Moreover, we found that the genes “blindly” selected by the algorithm are significantly enriched for known susceptibility genes discovered in GWASs of schizophrenia or cardiovascular diseases. The proposed framework opens up an approach to subject stratification.
Copyright © 2019 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.