A natural language processing-powered annotation tool provided better inter-rater agreement and was faster than manual reviews for phenotyping cognitive status.


“Phenotyping cognitive status is a challenging task, as dementia is often undiagnosed, and identifying signs of cognitive decline in EHRs involves reading clinician notes and combining it with other information in a patient’s chart, such as their problem lists, medications, care coordination notes, and MRI orders,” Sudeshna Das, PhD, explains.

Dr. Das adds that clinicians often use an extensive range of terms and phrases that can easily be overlooked in manual reviews. “Natural language processing (NLP) has the ability to automatically detect cognition-related patterns and phrases, reducing the chance that the annotator might miss any information relevant to the decision-making task,” she says.

For a study published in the Journal of Medical Internet Research, Dr. Das and colleagues examined whether NLP-powered semiautomated annotation could improve the speed and inter-rater reliability of chart reviews for phenotyping cognitive status. Clinicians adjudicated the cognitive status of patients using the semi-automated, NLP-powered annotation tool (NAT) or traditional chart reviews. Patient charts contained EHR data from two groups at Mass General Brigham: records for Medicare beneficiaries from the Mass General Brigham Accountable Care Organization (ACO data set) and records from 2 years before a COVID-19 diagnosis to the date of COVID-19 diagnosis (COVID-19 data set).

The researchers summarized diagnosis codes, medications, and laboratory test results, and clinical notes were managed through an NLP pipeline. Cognitive status was rated as normal, impaired, or undetermined, and assessment time and inter-rater agreement of NAT compared with manual chart reviews for cognitive status phenotyping was assessed.

NAT Results Faster, With Better Consensus

Dr. Das and colleagues included 627 patients in the study (ACO data set, N=100; COVID-19 data set, N=527). Patients in the COVID-19 data set were less likely to have an ICD code for dementia.

“NAT adjudication resulted in greater inter-rater agreement (Cohen κ, 0.89 vs 0.80) and was significantly faster (time difference: mean, 1.4 minutes; P<0.001) compared with manual chart reviews,” Dr. Das notes. “NAT adjudication provided evaluations that had stronger clinical consensus because of its integrated understanding of highlighted, relevant information and semi-automated NLP features.”

The COVID-19 data set served as a case-control study that examined the association between pre-existing cognitive impairment and adverse outcomes related to COVID-19, Dr. Das explains. “It was used as an exemplar of a research cohort that required the labeling of cognitive status.”

The cognitive status for 21.1% of patients in the COVID-19 data set was undetermined, indicating that there was little information available in the EHR to ascertain cognitive status in this group.

Using NAT in Clinical Settings

The tool used in the study “is primarily intended for annotating research cohorts, but it may be used to identify patients with cognitive concerns who may not have a formal diagnosis in their charts,” according to Dr. Das.

“Tools that screen the EHR for warning signs and present the digested information to providers may prove to be an important step for early intervention,” she says. “In preoperative settings, baseline cognitive impairment is frequently missed and increases the risk for delirium in elderly patients by as much as five-fold. Early recognition of risk using our tool may allow preventative measures that reduce the incidence, severity, and/or duration of delirium. The tool may also be used in inpatient or emergency settings to reduce costs of routine screening.”

Future research is needed before the NAT can be used in larger patient populations, she continues.

“While NAT improves the adjudicating of cognitive status compared with manual chart reviews, it is yet not scalable in large data sets with thousands of patients,” Dr. Das notes. “To scale to this extent, fully automated machine learning algorithms that duplicate the adjudication process are necessary. In future work, we plan to use NAT to develop gold-standard data sets for training and validation of such machine learning algorithms for phenotyping cognitive status.”