Early detection of sepsis can facilitate early clinical intervention with effective treatment and may reduce sepsis mortality rates. In view of this, machine learning-based automated diagnosis of sepsis using easily recordable physiological data can be more promising as compared to the gold standard rule-based clinical criteria in current practice. This study aims to develop such a machine learning framework that demonstrates the quantification of heterogeneity within the tabular electronic health records (EHR) data of clinical covariates to capture both linear relationships and nonlinear correlation for the early prediction of sepsis. Here, the statistics of pairwise association for each hour-covariate pair within the EHR data for every 6-hours window-duration with selected 24 covariates is described using pointwise mutual information (PMI) matrix. This matrix gives the heterogeneity of data as a two-dimensional map. Such matrices are fused horizontally along the z-axis as vertical slices in the xy plane to form a 3-way tensor for each record with the corresponding Length of Stay (L). Tensor factorization of such fused tensor for every record is performed using Tucker decomposition, and only the core tensors are retained later, excluding the 3 unitary matrices to provide the latent feature set for the prediction of sepsis onset. A five-fold cross-validation scheme is employed wherein the obtained 120 latent features from the reshaped core tensor, are fed to Light Gradient Boosting Machine Learning models (LightGBM) for binary classification, further alleviating the involved class imbalance. The machine-learning framework is designed via Bayesian optimization, yielding an average normalized utility score of 0.4519 as defined by challenge organizers and area under the receiver operating characteristic curve (AUROC) of 0.8621 on publicly available PhysioNet/Computing in Cardiology Challenge 2019 training data. The proposed tensor decomposition of 3-way fused tensor formulated using PMI matrices leverages higher-order temporal interactions between the pairwise associations among the clinical values for early prediction of sepsis. This is validated with improved risk prediction power for every hour of admission to the ICU in terms of utility score, AUROC, and F1 score. The results obtained show a significant improvement particularly in terms of utility score of ~1.5-2% under a 5-fold cross-validation scheme on entire training data as compared to a top entrant research study that participated in the challenge.
Copyright © 2021 Elsevier Ltd. All rights reserved.

Author