Atrial fibrillation (AF) is a major cardiovascular health problem: it is common, chronic and incurs substantial healthcare expenditure because of stroke. Oral anticoagulation reduces the risk of thromboembolic stroke in those at higher risk; but for a number of patients, stroke is the first manifestation of undetected AF. There is a rationale for the early diagnosis of AF, before the first complication occurs, but population-based screening is not recommended. Previous prediction models have been limited by their data sources and methodologies. An accurate model that uses existing routinely collected data is needed to inform clinicians of patient-level risk of AF, inform national screening policy and highlight predictors that may be amenable to primary prevention.
We will investigate the application of a range of deep learning techniques, including an adapted convolutional neural network, recurrent neural network and Transformer, on routinely collected primary care data to create a personalised model predicting the risk of new-onset AF over a range of time periods. The Clinical Practice Research Datalink (CPRD)-GOLD dataset will be used for derivation, and the CPRD-AURUM dataset will be used for external geographical validation. Both comprise a sizeable representative population and are linked at patient-level to secondary care databases. The performance of the deep learning models will be compared against classic machine learning and traditional statistical predictive modelling methods. We will only use risk factors accessible in primary care and endow the model with the ability to update risk prediction as it is presented with new data, to make the model more useful in clinical practice.
Permissions for CPRD-GOLD and CPRD-AURUM datasets were obtained from CPRD (ref no: 19_076). The CPRD ethical approval committee approved the study. The results will be submitted as a research paper for publication to a peer-reviewed journal and presented at peer-reviewed conferences.
A systematic review to incorporate within the overall project was registered on PROSPERO (registration number CRD42021245093). The study was registered on (NCT04657900).

© Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY. Published by BMJ.