Synoptic reporting provides a mechanism for uniform and structured pathology diagnostics. This paper demonstrates the functionality of Perl alternation and grouping expressions to classify electronic pathology reports generated from military treatment facilities. Eight Perl-based algorithms are validated to classify malignant melanoma, Hodgkin lymphoma, non-Hodgkin lymphoma, leukemia, and malignant neoplasms of the breast, ovary, testis, and thyroid.
Case finding cohorts were developed using diagnostic codes for neoplasm groups and matched by unique identifiers to obtain pathology records. Preprocessing techniques and Perl-based algorithms were applied to classify records as malignant, in situ, suspect, or nonapplicable, followed by a hand-review process to determine the accuracy of the algorithm classifications. Interrater reliability, sensitivity, specificity, positive predictive values, and negative predictive values were computed following abstractor adjudication.
The specificity of the Perl-based algorithms was consistently high, over 98%. Very few benign results were classified as malignant or in situ by the Perl-based algorithms; the leukemia algorithm classification was the only group to demonstrate a positive predictive value below 95%, at 91.9%. Three algorithm classification groups demonstrated a sensitivity of < 80%, including malignant neoplasm of the ovary (33.3%), leukemia (52.8%), and non-Hodgkin lymphoma (62.9%). The pathology records for these results included substantial linguistic variation.
This paper contextualizes the utility and value of an algorithm logic built around synoptic reporting to identify neoplasms from electronic pathology results. The major strength includes the application of Perl-based coding in SAS, an accessible software application, to develop highly specific algorithms across institutional variation in diagnostic documentation.

Author