Researchers need more than RCT data to improve model’s accuracy

Artificial intelligence is not yet smart enough to reliably predict who will survive metastatic castration-resistant prostate cancer (CRPC), Stanford University researchers report.

A machine-learning model based on data from randomized clinical trials (RCTs) did not perform well when applied directly to real-world data, according to results published in JAMA Network Open.

Nevertheless, researchers are encouraged and hope that by using feature reduction strategies, they can develop a model that is better in tune with health care and more diverse by tweaking the model to add top features. Building models based on RCTs may allow for better data quality and integrity, and using real-world data may provide a better basis for decision-making.

“Although a number of strategies exist for managing metastatic CRPC, the optimal selection and sequencing of these therapies is an active area of research. Better estimation of the individual risk profiles of patients with metastatic CRPC may facilitate more appropriate matching of patients to treatment pathways or clinical trials, enable personalization of second-line treatment regimens, and allow population-level evaluations,” explained researchers led by Jean Coquet, PhD, of Stanford University School of Medicine, Stanford, CA.

Coquet and colleagues conducted the study to assess the performance of a prognostic model derived from RCTs that predicts survival in patients with metastatic CRPC when applied to real-world data from EHRs. They applied the Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge RCT-trained model to real-word data obtained from the EHRs of a tertiary care academic medical with a comprehensive cancer center.

Previously, researchers of the DREAM challenge used data from four phase III RCTs to build the best survival model for patients with metastatic CRPC.

For their survival model, Coquet and colleagues included 2,113 patients, 1,600 from the RCT cohort and 513 from the EHR cohort. The two cohorts differed considerably. Most participants in the RCT cohort were white (86.9% versus 65.7% in the EHR cohort), and a greater proportion of the EHR cohort were aged greater than 75 years (37.2% versus 24.3% in the RCT cohort) and had a greater number of comorbidities (2.5% versus 1.8%, respectively).

The DREAM challenge RCT-trained model was applied to the real-world data, and then retrained with optimized feature selection using these data.

The RCT-derived model used 101 variables, 10 of which were not included in the EHR data set. Of these, 3 were among the top 10 features in the DREAM challenge RCT model, including Eastern Cooperative Oncology Group (ECOG) score and two variables that specify metastases in the liver and lymphatic system. Seven of the 91 remaining variables were not consistently included in the EHRs and were removed.

With the remaining 84 variables and imputing the missing ones, the original DREAM challenge RCT-trained model achieved an average 10-fold mean iAUC of 0.722 in the EHR data set, compared with 0.768 in the original DREAM challenge test data set.

Researchers retrained the model using the 84 variables in the EHR data set, which resulted in a mean iAUC of 0.762. They then removed 24 variables for a 60-variable model. Further optimization via feature selection provided them with models that contained 10 to 60 features, resulting in mean iAUC values ranging from 0.0775 to 0.792.

The top 10 predictive features in the EHR-trained models included lymphocytes, neutrophil level to blood cells ratio, spinal cord compression, body mass index, and pulse. Of these 10 features, 5 were not among the top 10 features in the DREAM challenge RCT-derived model.

In neither cohort was race a significant predictor of survival. To determine the impact of race disparities on risk scores, Coquet and colleagues calculated algorithmic risk scores by comorbidities and PSA values as stratified by race. They found that, to achieve the same risk scores, black and Hispanic patients needed more chronic conditions and higher PSA values compared with white and Asian patients.

The best-performing EHR model included 25 variables, and achieved a mean iAUC of 0.792. Using this model, researchers classified patients into high- and low-risk mortality groups, and the difference between the two groups was significant (HR: 2.7; 95% CI: 2.0-3.7; P ˂ 0.001).

“The top predictive features from our model differed slightly from the RCT model based on the availability of specific variables in the EHRs. In our model, the presence of high alkaline phosphatase levels, high aspartate aminotransferase levels, and high PSA values at the time of CRPC diagnosis were predictive of a worse prognosis,” wrote Coquet and fellow researchers, who are hopeful that future efforts will result in a clinical decision aid personalized to individual patient health records.

In an accompanying editorial, Joe V Selby, MD, MPH, retired former Executive Director of the Patient-Centered Outcomes Research Institute, Washington, DC, and Bruce H. Fireman, of The Permanente Medical Group, Oakland, CA, expounded on the value of developing predictive models for patient care, and the challenges inherent in using both RCT and real-world data to do so.

“Predictive models are a tool of increasing value for practicing personalized, patient-centered medicine, providing both patients and their clinicians with individualized information on prognosis or response to therapy. An individual’s predicted probability of dying or experiencing nonfatal complications or disease progression is often assumed to be a marker of the need for further treatment, the reasoning being that persons at greater risk have the most to gain from treatment. Few models have been developed to date to directly predict individual response to specific treatments,”

Although RCT-derived data is perhaps superior, they added, there are hurdles to overcome in obtaining such data.

“However, sharing of RCT data sets that include experimental treatment groups has been slow to occur despite forceful encouragement because of the frequent proprietary and competitive concerns of trial sponsors,” they added.

“To better meet the needs of patient and clinician decision-makers, continued efforts to require and encourage prompt sharing of full trial data by sponsors are warranted. As our methods improve and our experience with predicting treatment response increases, we may reach a point at which we ask whether real-world EHR data may also have advantages for this purpose. How exciting it would be to study the predictors of treatment response in nearly real time,” concluded Selby and Fireman.

Limitations of this study include the differences between real-world and RCT settings in seeking care, and a lack of data regarding treatment with intermittent hormone therapy and the dates of prescribed medications.

  1. In this study in patients with metastatic castration-resistant prostate cancer (CRPC), predictions from models trained on data from RCTs could not be generalized to the real-world setting with data from electronic health records (EHRs).

  2. Prognostic models derived from RCT data provide insights into mortality in patients with metastatic castration-resistant prostate cancer, but their direct application to real-world data requires local optimization.

E.C. Meszaros, Contributing Writer, BreakingMED™

This study was supported by the National Cancer Institute of the National Institutes of Health.

Coquet, Selby, and Fireman reported no disclosures.

Cat ID: 25

Topic ID: 78,25,730,25,935,192,73,925

Author