A machine learning algorithm used to analyse EHRs identified high-risk individuals who could potentially benefit from HIV PrEP, according to a report presented this week at IDWeek 2016 in New Orleans. Out of 800,000 patients in a large EHR database, more that 8000 were found to be potential PrEP candidates.
We extracted potentially relevant demographics (e.g. age, sex, race), diagnoses (e.g. anal dysplasia, substance use disorders), prescriptions (e.g. suboxone, sildenafil), laboratory tests (e.g. syphilis, gonorrhea, chlamydia, hepatitis C), and procedures (e.g. anal pap smear) from the EHR repository of Atrius Health, a large ambulatory practice group in Massachusetts with ~800,000 patients. We matched each patient with incident HIV during 2006-2015 to 100 HIV-uninfected controls with similar gender and duration of affiliation with Atrius Health as of the year of HIV diagnosis. We developed logistic regression models and machine learning algorithms to predict incident HIV infection in cases vs controls. Machine learning methods included LASSO, Ridge, SVM, and super learner. We assessed area-under-the-curve (AUC), sensitivity, and positive predictive value (PPV) of each algorithm and compared prediction scores among 45 patients independently started on PrEP by Atrius clinicians versus the general population.
There were 138 incident HIV infections in the population. Ten-fold cross-validated AUC was highest with the Ridge method (AUC 0.77, sensitivity 59%, PPV 0.34% for patients with risk scores in the top quintile). Strong predictors included a prior diagnosis of anorectal ulcer, total number of positive gonorrhea tests, and acute HIV testing in the past two years. Median HIV prediction scores for patients independently started on PrEP by Atrius clinicians were significantly higher than those of the general population (median 5.43×10-4 vs 9.05×10-5, P<.0001). There were 3,229 patients in the general population (0.37%) with risk scores above the median scores of patients started on PrEP by Atrius clinicians. These are promising potential candidates to evaluate for interest and suitability for PrEP.
Automated analysis of data routinely stored in EHRs can identify patients at increased risk for incident HIV who are potential candidates for PrEP.