A machine learning (ML) algorithm trained to run silently in the background of an electronic health record (EHR) system predicted short-term mortality risk in cancer patients more accurately than traditional prognostic classifiers, the first prospective study of its kind has shown.
In a cohort of 24,582 patients, 4.2% died within 180 days of their index encounter at one of 18 medical or gynecologic oncology practices within a large US academic health care system, senior author Ravi Parikh, MD, University of Pennsylvania Perelman School of Medicine in Philadelphia, Pennsylvania and colleagues reported in JAMA Oncology.
At a prespecified mortality risk threshold of 40%, 2.5% of patients were identified as being at high risk for death at 180 days, investigators noted. At this threshold, the positive predictive value (PPV) of the algorithm was 45.2% in the high-risk group compared with 3.1% in the low-risk group for a negative predictive value (NPV) of 96.9%, they added.
This performance was significantly better than the predictive ability of 2 other commonly used prognostic indices, the Eastern Cooperative Oncology Group (ECOG) and Elixhauser comorbidity index-based classifiers.
“Underestimating mortality risk is associated with aggressive end-of-life care among patients with cancer,” Parikh and colleagues observed.
“[O]ur results suggest that an ML algorithm can be feasibly integrated into the EHR to generate real-time, accurate predictions of short-term mortality risk for patients with cancer that outperform traditional prognostic indices,” they concluded.
Investigators had previously trained the algorithm using 559 structured EHR features as inputs with historic data from 2016. Certain variables, notably ECOG performance status, stage and genetic variants were not included in the algorithm. The trained model then ran silently in the background of the University of Pennsylvania Health System EHR from March 1 to April 30, 2019.
“[P]atients who had an encounter in 1 of 18 practices during the baseline period were followed up for 180 days after the index encounter,” the authors explained.
The median age of the cohort was 64.6 years (interquartile range [IQR], 53.6-73.2 years) and slightly over 62% were female. There was one tertiary care practice within the health care system and slightly over 43% of patients were seen in this particular practice.
“The primary performance metric was the area under the receiver operating characteristic curve (AUC),” researchers explained. The AUC is one of the most important evaluation metrics for evaluating any classification model’s performance and the higher the AUC, the better the performance.
In the overall cohort, the AUC of the ML algorithm was 0.89 (95% CI, 0.88-0.90), investigators noted. The performance of the algorithm did not differ between patients seen in general oncology practices compared to those seen at the tertiary academic center, although performance varied across disease-specific groups within the tertiary practice, they found.
For example, performance ranged from an AUC of 0.74 for neuro-oncology to an AUC of 0.96 for breast oncology. The algorithm also performed slightly better for women at an AUC of 0.91 compared to an AUC of 0.86 for men.
In contrast, “[t]here were no significant differences in performance across race/ethnicity and insurance status,” Manz and colleagues emphasized.
As investigators explained, both ECOG and Elixhauser comorbidity scores are validated prognostic indices commonly used to make decisions about treatment and clinical trial enrollment for cancer patients.
Only about one-quarter of the cohort had a coded ECOG but of these, 15.5% had an ECOG greater than or equal to 2.
“Compared with the baseline ECOG-only classifier, the enhanced classifier integrating the ECOG and ML classifiers had a significantly higher AUC…and higher PPV,” investigators observed—the AUC being 0.17 (95% CI, 0.14-0.19) higher in favor of the integrated classifier and the PPV being 0.18 higher for the same integrated approach.
This was also true when comparisons were made between the Elixhauser only classifier and the enhanced, integrated Elixhauser and ML classifiers, where again, the integrated classifier had a significantly higher AUC with an AUC difference of 0.20 (95% CI, 0.18-0.2) in favor of the integrated classifier and a difference in PPV of 0.36 again in favor of the enhanced classifier approach.
However, it was noteworthy that the algorithm, while well calibrated for patients with a low mortality risk at 180 days of 10% or less, overestimated mortality risk among patients whose mortality risk was greater than 10%, as Manz and colleagues acknowledged.
“[O]ur algorithm demonstrated good performance on several clinically relevant metrics even when applied to a prospective cohort 3 years later,” the authors pointed out.
This is important, they implied, as it showed that a mortality risk prediction algorithm trained on retrospective data still performs well and has good clinical utility when applied to a more recent cohort of cancer patients.
Investigators also pointed out that the enhanced classifier integrating the ML algorithm into existing prognostic indices resulted in better reclassification of patients.
For example, the ML algorithm had a 0.17 higher AUC (95% CI, 0.14-0.19) than the ECOG prognostic index and a 0.20 (95% CI, 0.18-0.21) higher AUC than the Elixhauser prognostic index.
Despite the superior performance of the ML algorithm, the authors cautioned that it remains important to demonstrate that this improved accuracy in mortality prediction translates into better clinical decision-making.
In order to demonstrate that their algorithm does lead to better decision-making, investigators carried out a randomized trial in which they evaluated the new classifier to see if identification of cancer patients triggered more end-of-life conversations.
Results from this study indicated that the algorithm did prompt more of such conversations, suggesting that mortality risk prediction tools may indeed prompt physicians to engage in more end-of-life care discussions earlier with patients.
Commenting on the study’s findings, John Kang, MD, PhD, University of Washington, Seattle, Washington, and co-editorialists acknowledged that the authors are taking important steps in testing how ML can improve patient care.
Nevertheless, with a selected 2.5% “alert rate” — namely encounters assessed as high-risk—the PPV of the ML algorithm was low as was its sensitivity.
“The low sensitivity indicates that the selected cutoff does not capture the majority of short-term mortality [while] the lower PPV indicates that around 50% of flagged patients survived past 180 days,” as Kang and colleagues pointed out—limitations that clearly could affect potential clinical usefulness of the ML algorithm, as they suggested.
Given this, Kang and colleagues suggest that conservative interventions such as serious illness conversations might be the preferred by clinicians.
They also pointed out that it is not a given that this particular ML algorithm based on EHR data from a large academic health system in Pennsylvania will prove to be transferable to other institutions whose EHR data may differ considerably from data from Pennsylvania.
“In the next decade, we will continue to see computational models for individual decision-making beyond staging guidelines and prognostic tools,” Kang and colleagues predicted.
“[B]ut as we improve data sharing and standards, we can look forward to more nuanced decisions as well,” they suggested.
A machine learning algorithm trained to run silently in the background of an electronic health record system predicted mortality risk in cancer patients at 180 days better than traditional prognostic classifiers.
It was feasible to automatically flag patients at high-risk for short-term mortality while at the same time, accurately rule out low-risk patients for the same endpoint.
Pam Harrison, Contributing Writer, BreakingMED™
The study was supported by a grant from the Penn Center for Precision Medicine Accelerator Fund among others.
Parikh reported receiving personal fees from GNS Healthcare and Cancer Study Group as well as grants from Conquer Cancer Foundation, among other foundations and institutes.
Cat ID: 120
Topic ID: 78,120,120,935,192,925