Multiple commercial and open-source software applications are available for texture analysis. Nonstandard techniques can cause undesirable variability that impedes result reproducibility and limits clinical utility. The purpose of this study is to measure agreement of texture metrics extracted by 6 software packages. This retrospective study included 40 renal cell carcinomas with contrast-enhanced CT from The Cancer Genome Atlas and Imaging Archive. Images were analyzed by 7 readers at 6 sites. Each reader used 1 of 6 software packages to extract commonly studied texture features. Inter and intra-reader agreement for segmentation was assessed with intra-class correlation coefficients. First-order (available in 6 packages) and second-order (available in 3 packages) texture features were compared between software pairs using Pearson correlation. Inter- and intra-reader agreement was excellent (ICC 0.93-1). First-order feature correlations were strong (r>0.8, p<0.001) between 75% (21/28) of software pairs for mean and standard deviation, 48% (10/21) for entropy, 29% (8/28) for skewness, and 25% (7/28) for kurtosis. Of 15 second-order features, only co-occurrence matrix correlation, grey-level non-uniformity, and run-length non-uniformity showed strong correlation between software packages (0.90-1, p<0.001). Variability in first and second order texture features was common across software configurations and produced inconsistent results. Standardized algorithms and reporting methods are needed before texture data can be reliably used for clinical applications. It is important to be aware of variability related to texture software processing and configuration when reporting and comparing outputs.

Author