Multiple commercial, open-source, and academic software tools exist for objective quantification of lung density in computed tomography (CT) images. The purpose of this study was evaluate the inter-software reproducibility of CT lung density measurements.
CT images from 50 participants from the COPDGene cohort study were randomly selected for analysis; n=10 participants across each Global Initiative for Chronic Obstructive Lung Disease (GOLD) grade (GOLD 0-IV). Academic-based groups (n=4) and commercial vendors (n=4) participated anonymously to generate CT lung density measurements using their software tools. CT total lung volume (TLV), percentage of the low attenuation areas in the lung with Hounsfield unit (HU) values below -950HU (LAA ), and the HU value corresponding to the 15 percentile on the parenchymal density histogram (Perc15) were included in the analysis. The inter-software bias and reproducibility coefficient (RDC) was generated with and without quality assurance (QA) for manual correction of the lung segmentation; intra-software bias and RDC was also generated by repeated measurements on the same images.
Inter-software mean bias was within ±0.22mL, ±0.46%, and ±0.97HU for TLV, LAA and Perc15, respectively. The reproducibility coefficient (RDC) was 0.35L, 1.2% and 1.8HU for TLV, LAA and Perc15, respectively. Inter-software RDC remained unchanged following QA: 0.35L, 1.2% and 1.8HU for TLV, LAA and Perc15, respectively. All software investigated had an intra-software RDC of 0. The RDC was comparable for TLV, LAA and Perc15 measurements, respectively, for academic-based groups/commercial vendor-based software tools: 0.39L/0.32L, 1.2%/1.2%, and 1.7HU/1.6 HU. Multivariable regression analysis showed that academic-based software tools had greater within-subject standard deviation of TLV than commercial vendors, but no significant differences between academic and commercial groups were found for LAA or Perc15 measurements.
CT total lung volume and lung density measurement bias and reproducibility was reported across eight different software tools. Bias was negligible across vendors, reproducibility was comparable for software tools generated by academic-based groups and commercial vendors, and segmentation QA had negligible impact on measurement variability between software tools. In summary, results from this study report the amount of additional measurement variability that should be accounted for when using different software tools to measure lung density longitudinally with well-standardized image acquisition protocols. However, intra-software reproducibility was deterministic for all cases so use of the same software tool to reduce variability for serial studies is highly recommended.

This article is protected by copyright. All rights reserved.