The lack of effective, consistent, reproducible and efficient asthma ascertainment methods results in inconsistent asthma cohorts and study results for clinical trials or other studies. We aimed to assess whether application of expert artificial intelligence (AI)-based natural language processing (NLP) algorithms for two existing asthma criteria to electronic health records of a paediatric population systematically identifies childhood asthma and its subgroups with distinctive characteristics.
Using the 1997-2007 Olmsted County Birth Cohort, we applied validated NLP algorithms for Predetermined Asthma Criteria (NLP-PAC) as well as Asthma Predictive Index (NLP-API). We categorised subjects into four groups (both criteria positive (NLP-PAC/NLP-API); PAC positive only (NLP-PAC only); API positive only (NLP-API only); and both criteria negative (NLP-PAC/NLP-API)) and characterised them. Results were replicated in unsupervised cluster analysis for asthmatics and a random sample of 300 children using laboratory and pulmonary function tests (PFTs).
Of the 8196 subjects (51% male, 80% white), we identified 1614 (20%), NLP-PAC/NLP-API; 954 (12%), NLP-PAC only; 105 (1%), NLP-API only; and 5523 (67%), NLP-PAC/NLP-API. Asthmatic children classified as NLP-PAC/NLP-API showed earlier onset asthma, more Th2-high profile, poorer lung function, higher asthma exacerbation and higher risk of asthma-associated comorbidities compared with other groups. These results were consistent with those based on unsupervised cluster analysis and lab and PFT data of a random sample of study subjects.
Expert AI-based NLP algorithms for two asthma criteria systematically identify childhood asthma with distinctive characteristics. This approach may improve precision, reproducibility, consistency and efficiency of large-scale clinical studies for asthma and enable population management.

© Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.