The relationship of prodromal markers of PD with PD mortality is unclear. Electronic health records (EHRs) provide a large source of raw data that could be useful in the identification of novel relevant prognostic factors in PD. We aimed to provide a proof of concept for automated data mining and pattern recognition of EHRs of PD patients and to study associations between prodromal markers and PD mortality.
Data from EHRs of PD patients (n = 2522) were collected from the Turku University Hospital database between 2006 and 2016. The data contained >27 million words/numbers and >750000 unique expressions. The 5000 most common words were identified in three-year time period before PD diagnosis. Cox regression was used to investigate the association of expressions with the 5-year survival of PD patients.
During the five-year period after PD diagnosis, 839 patients died (33.3%). If expressions associated with psychosis/hallucinations were identified within 3 years before the diagnosis, worse survival was observed (hazard ratio = 1.71, 95%CI = 1.46-1.99, p < 0.001). Similar effects were observed for words associated with cognition (1.23, 1.05-1.43, p = 0.009), constipation (1.34, 1.15-1.56, p = 0.0002) and pain (1.34, 1.12-1.60, p = 0.001).
Automated mining of EHRs can predict relevant clinical outcomes in PD. The approach can identify factors that have previously been associated with survival and detect novel associations, as observed in the link between poor survival and prediagnostic pain. The significance of early pain in PD prognosis should be the focus of future studies with alternate methods.

Copyright © 2022 The Authors. Published by Elsevier Ltd.. All rights reserved.