For a study, the researchers sought to undertake 2 analyses. They combined EHR data with gold-standard data from a well-established clinic-based research registry. First, they used a 2-stage process to develop models that predicted registry-annotated relapse events: (1) using LASSO to phenotype a key predictor of future relapse (i.e., past 1-year relapse status) from contemporaneous EHR data, and (2) using imputed prior 1-year relapse status and other algorithm-selected features to predict future 1-year relapse. Second, using registry records and electronic prescriptions, investigators created DMT treatment groups for 2 pairs of commonly prescribed DMTs (dimethyl fumarate vs fingolimod, natalizumab vs rituximab) and used doubly robust estimation to correct for confounding biases among the covariates from registry and EHR data for three outcomes: 1-year and 2-year relapse rates, as well as time to relapse. The final model (age, disease duration, and imputed prior 1-year relapse history) had a predictive accuracy (AUC) of 0.707, which was preferable to the baseline model (age, sex, race/ethnicity, disease duration) and non-inferior to a model that contains actual prior 1-year relapse history in the future relapse prediction analysis. After accounting for confounding biases and repeated testing, no substantial variations in relapse results were detected between dimethyl fumarate and fingolimod for all 3 results. Still, natalizumab was related to a greater relapse rate for all 3 results than rituximab. The research team presented a unique machine-learning method that predicts 1-year future MS relapse with accuracy equivalent to current clinical prediction tools and can be used at the point of care to demonstrate the clinical value of EHR data in MS. Furthermore, they showed how EHR data may be used as high-dimensional variables in real-world treatment comparisons.