We used the CPCSSN (Canadian Primary Sentinel Surveillance Network) database, which contains aggregated electronic health information from a cohort of primary care practices, to develop and evaluate a prognostic prediction model to estimate 5-year osteoarthritis risk, addressing contextual challenges of data availability and missingness. We constructed a retrospective cohort of 383,117 eligible primary care patients who were included in the cohort if they had an encounter with their primary care practitioner between 1 January 2009 and 31 December 2010. Patients were excluded if they had a diagnosis of osteoarthritis prior to their first visit in this time period. Incident cases of osteoarthritis were observed. The model was constructed to predict incident osteoarthritis based on age, sex, BMI, previous leg injury, and osteoporosis. Evaluation of the model used internal 10-fold cross-validation; we argue that internal validation is particularly appropriate for a model that is to be integrated into the same context from which the data were derived.
The resulting prediction model for 5-year risk of osteoarthritis diagnosis demonstrated state-of-the-art discrimination (estimated AUROC 0.84) and good calibration (assessed visually.) The model relies only on information that is readily available in Canadian primary care settings, and hence is appropriate for integration into Canadian primary care health information technology.
If the contextual challenges arising when using primary care electronic medical record data are appropriately addressed, highly discriminative models for osteoarthritis risk may be constructed using only data commonly available in primary care. Because the models are constructed from data in the same setting where the model is to be applied, internal validation provides strong evidence that the resulting model will perform well in its intended application.
Copyright © 2020 Elsevier B.V. All rights reserved.