Quantitative CT texture-based method to predict diagnosis and prognosis of fibrosing interstitial lung disease patterns

doi:10.21203/rs.3.rs-224275/v1

Download PDF

Research Article

Quantitative CT texture-based method to predict diagnosis and prognosis of fibrosing interstitial lung disease patterns

https://doi.org/10.21203/rs.3.rs-224275/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Objectives

To utilize high-resolution quantitative CT (QCT) imaging features for prediction of diagnosis and prognosis in fibrosing interstitial lung diseases (ILD).

Methods

40 ILD patients (20 usual interstitial pneumonia (UIP), 20 non-UIP pattern ILD) were classified by expert consensus of 2 radiologists and followed for 7 years. Clinical variables were recorded. Following segmentation of the lung field, a total of 26 texture features were extracted using a lattice-based approach (TM model). The TM model was compared with previously histogram-based model (HM) for their abilities to classify UIP vs non-UIP. For prognostic assessment, survival analysis was performed comparing the expert diagnostic labels versus TM metrics.

Results

In the classification analysis, the TM model outperformed the HM method with AUC of 0.70. While survival curves of UIP vs non-UIP expert labels in Cox regression analysis were not statistically different, TM QCT features allowed statistically significant partition of the cohort.

Conclusion

TM model outperformed HM model in distinguishing UIP from non-UIP patterns. Most importantly, TM allows for partitioning of the cohort into distinct survival groups, whereas expert UIP vs non-UIP labeling does not. QCT TM models may improve diagnosis of ILD and offer more accurate prognostication, better guiding patient management.

Nuclear Medicine & Medical Imaging

Pulmonology

Biomedical Engineering

interstitial lung diseases

tomography

computer-assisted image analysis

classification

outcomes assessment

Accurate classification and quantification of severity of chronic fibrosing interstitial lung diseases (ILD) is crucial for prognostication, management and assessment of treatment response [1, 2, 3, 4, 5]. ILDs are a heterogeneous group of diseases characterized by interstitial inflammation and variable degrees of fibrosis, with widely different morbidity and mortality, often leading to reduced lung volume, decreased lung compliance, restrictive physiology, and potentially, respiratory failure. Progressive fibrotic ILDs may require lung transplantation, a costly procedure with substantial morbidity and mortality, even though two drugs (pirfenidone and nindetanib) have been shown to decrease the rate of disease progression in UIP (usual interstitial pneumonia). Stable ILDs, on the other hand, can be managed conservatively, and ILDs without overt fibrosis may be reversible with anti-inflammatory therapy [3–11]. Pulmonary function tests (PFTs) are used clinically to diagnose the severity of functional impairment due to ILD by demonstrating a restrictive physiology (decreased FEV1 and FVC) with impaired gas exchange (low DLCO). PFTs nonetheless are limited as these provide only a global assessment of lung physiology and cannot discern distinct ILD patterns, with widely different prognoses.

High-resolution computed tomography (HRCT) is increasingly relied upon for more accurate characterization of ILDs. It has been shown that an UIP (usual interstitial pneumonia) pattern is associated with worse prognosis than non-UIP patterns of ILD, which include NSIP (non-specific interstitial pneumonia) and CHP (chronic hypersensitivity pneumonitis) [3, 4, 5, 10]. ILD diagnosis remains challenging and requires considerable expertise, to such an extent that the ground-truth is not derived from radiology or pathology results alone, but rather from multidisciplinary discussion (MDD) consensus [3–11]. The current classification system is prone to substantial interobserver variability and limitations, with up to 25% of patients labeled as “unclassifiable” even after thorough review by an MDD expert team. For clinical drug trials, it would be highly desirable to have robust quantitative imaging biomarkers to reliably classify ILD and subsequently objectively assess therapy response and disease evolution. Although many studies have demonstrated the diagnostic value of pattern analysis of HRCT scans [12–24], the existing methods are limited due to several reasons: 1) most investigations have been limited to subjective, non-quantitative assessment of HRCT disease patterns; 2) many quantitative investigations have focused on simple imaging features, lacking a comprehensive characterization of complex or overlapping ILD texture patterns; and 3) automated classification results have been compared only with radiologist expert diagnosis, missing the opportunity to directly correlate quantitative imaging features with patient centered outcomes such as overall survival.

Quantitative CT imaging (QCT) with whole lung segmentation, image feature selection and texture quantification has been suggested to improve disease classification [12–19], [20–24]. Prior work has shown that imaging derived CT metrics can outperform PFT parameters to correctly distinguish COPD from ILD patients, utilizing computer-assisted classification via machine learning techniques such as support vector machines, with 98.1% accuracy [20]. Additional publications have suggested that automated quantification of several basic textures on thoracic HRCT can discriminate between UIP and NSIP (with accuracy of 82%) [14], as well as assess temporal changes in HRCT of patients with fibrotic interstitial pneumonias [13].

Our hypotheses are the following: 1) HRCT scans in ILD patients contain enough latent structural and functional information, which QCT driven texture based algorithms can compute in order to outperform simpler methods relying solely on histogram signatures, for disease classification; 2) Clusters of textures (phenotypes) derived from HRCT may allow for better prognostication, when compared to expert consensus diagnosis (UIP vs non-UIP patterns).

Patient Selection

This retrospective study design was approved by the institutional review board (IRB) of the University of Pennsylvania (#821679). This is a retrospective study in which patient data has been fully anonymized. Adequate precautions have been undertaken to ensure protection of patient privacy and confidentiality. The IRB judged there were no other risks to patients related to the performance of the study. Moreover, all study investigators are fully trained in HIPAA compliance. Consequently, the IRB at the University of Pennsylvania waived the need for informed consent in our study. All methods were carried out in accordance with relevant guidelines and regulations approved by IRB at the University of Pennsylvania. Through radiology and medical record searches, we retrospectively identified 40 patients using the following inclusion criteria: a) thoracic HRCT performed with ≤1.5 mm thickness contiguous axial slices and high spatial resolution algorithms; b) presence of fibrotic interstitial lung disease patterns on HRCT (UIP – usual interstitial pneumonia, NSIP – non-specific interstitial pneumonia or CHP – chronic hypersensitivity pneumonitis) by pathology or by a combination of clinical and radiological findings (Figure 1). HRCT datasets were anonymized and transferred to an imaging-processing computer cluster. Concurrently, PFT and clinical data were obtained from the electronic medical record, anonymized, and associated with specific imaging datasets. The dataset was balanced by design: we randomly selected 20 patients with UIP and randomly selected 20 patients with NSIP or CHP patterns of ILD, for maximizing statistical power, even with a relatively small sample size.

Expert Diagnosis

Subjective HRCT image analysis was performed in consensus by two thoracic radiologists (two expert radiologists with 11 and 7 years of subspecialty expertise), to assess the presence, spatial distribution and severity of the following basic imaging patterns: reticulation, traction bronchiectasis, honeycombing, ground-glass opacities, consolidations, emphysema and normal parenchyma. Additional findings such as masses or pleural effusions were noted. Accordingly, each patient was classified in one of two groups: definite or probable UIP pattern, versus most compatible with non-UIP, which comprised chronic hypersensitivity pneumonitis and non-specific interstitial pneumonia patterns of ILD. Moreover, the disease severity was stratified using a semi-quantitative score of mild (less than 25% of parenchyma involved), moderate (≥ 25% but ≤ 50% of parenchyma involved) and severe (≥ 50% of parenchyma involved). The dataset was balanced, with 20 patients in the UIP pattern, and 20 patients in the non-UIP pattern. The demographic and clinical characteristics of these 40 patients are summarized in Table 1.

QCT analysis via histogram-based (HM) method

QCT analysis was initially performed using the IMBIO Lung Texture Analysis (LTA) (CALIPER) software, which is currently investigational in the United States and not FDA approved for clinical utilization. The first step of LTA is the segmentation of the lungs, followed by segmentation of the airway tree and pulmonary vessels. The next step is to apply the CALIPER [22] algorithm to the lung parenchyma, which uses computer vision-based image analysis of volumetric histogram features, and 3D morphology to classify groups of voxels (each containing 15 1x1x1 mm voxels). The detection and quantification of lung parenchymal findings is based on histogram signature mapping techniques trained through expert radiologist consensus assessment of pathologically confirmed training sets obtained through the Lung Tissue Research Consortium (LTRC). The number of voxels in the lung parenchyma classified as each of the fundamental texture categories are calculated and converted to percentages of the combined left and right lung volume, the individual lungs, and the upper, middle, and lower sextants of each of the lungs. Voxels that are identified as vessels are not included in the calculation of the lung volume. LTA then generates as output a new series of DICOM images with multi-color, semi-transparent overlays to indicate the texture categories; and a PDF report that includes a graphic summary of the quantitative results, indicating percentage of basic parenchymal image patterns (normal, ground-glass, reticulation, honeycombing, and hyperlucent), lung volumes and spatial distribution by lung zone.

QCT Analysis via Texture based (TM) method

Our segmentation method is a 3-dimensional, intensity-based algorithm using K-means clustering to properly determine cluster centers of air / lung tissue versus soft tissue attenuation, the latter which we removed from the segmented volume. CT attenuation of the lung parenchyma (measured by Hounsfield units (HU)) for a normal subject lies between -900 to -700; and interstitial fibrosis can raise the parenchymal attenuation to as high as -100 HU, making it impossible to segment the lungs solely based on intensity thresholding. Our method provides an efficient automated lung segmentation (Figure 2), which was further refined with minor (less than 10% volume changes) manual corrections by an expert radiologist on all cases where segmentation results were suboptimal (approximately 25% of the dataset). 75% of the dataset did not require any manual corrections. All segmentation results were reviewed by an expert cardiothoracic radiologist for quality control, and following minor manual corrections as above, the entire dataset was deemed optimally segmented.

QCT feature extraction was subsequently performed via an in-house lattice-based texture estimation software pipeline (Figure 3). This method, capable of capturing the texture heterogeneity of the ILD pattern [25], is based on a regular grid virtually overlaid on each CT image. Texture features are computed from the intersection (i.e., lattice) points of the grid lines within the lung, using a local cube (window) centered at each lattice point (Appendix provides mathematical details). Using this novel strategy, a comprehensive set of 26 imaging features from three major statistical groups of features, gray-level histogram, co-occurrence, and run-length, were computed and saved as 3D feature maps. Features were calculated using a range of window size (ROI) traversing the image from 4 mm to 20 mm. Different ROI sizes can help to assess texture information at different spatial scales, from the finest to coarsest textures, which may capture different levels of histologic changes (from fine to coarse fibrosis). These features were averaged across each ROI for representing distinct texture phenotypes.

With the 26 3D feature maps, a K-means clustering approach was applied to each of the 40 patients to group the voxels within the lung that share similar feature patterns (ILD sub-types). The choice of k for the number of clusters is important. We applied K=5 for the study, where 5 tissue types were modeled: normal, ground-glass, reticular, honeycombing, and hyperlucent. The clustering results for two sample window sizes 4 mm and 12 mm are shown in Figure 4. The volume ratio of each cluster (the number of pixels for each tissue cluster divided by total number of pixels) were calculated and then fed into a Support Vector Machine (SVM) model to assess QCT ability to predict UIP vs non-UIP diagnosis. Two set of covariates were used in the model; 1: solely imaging features 2: imaging features combined with relevant clinical measures such as age, gender, and severity of the disease. Then, the TM based classification models were compared to HM method for different window sizes (Figure 5).

Although the sample size is relatively small, imaging features extracted were used to build a deep learning classification model with 5-fold cross validation. A neural network based on PyTorch framework with two hidden layers and 66 nodes was used to build the network (Supplementary materials for details).

Survival Analysis

The retrospective nature of our dataset allowed over 7 years of follow up, enabling correlation with relevant patient outcomes such as death, development of respiratory failure requiring ICU admission or need for lung transplantation.

Using 26 extracted features for ILD patients, Cox proportional regression hazard model [26] was performed for time-to-event outcome analysis for survival assessment prediction. The C-statistic was used as a measure of predictive performance of features. Two sets of covariates were considered; 1: UIP and non-UIP pattern labels by radiologist experts; 2: QCT imaging TM features with relevant clinical features such as age, gender, and severity for each patient as additional covariates (Figure 6). To reduce the number of covariates and the potential for overfitting, the C-statistic for each feature was evaluated based on a univariable cox regression model with cross validation.

Imaging features derived from TM and HM were fed into a SVM model to assess their performance to classify UIP vs non-UIP patients. AUCs of the SVM models for different window sizes with and without clinical measures for TM are shown in Table 2. While HM resulted in maximum accuracy of 0.64 with CVs, TM with CVs produced a maximum AUC of 0.75 for window size = 8 mm. Window size = 8 mm together with clinical measures yielded the highest AUC (0.745) (Table 2, Figure 5). The neural network model yielded 75 accuracy with 5-folds cross validation.

Furthermore, survival analysis was performed to predict prognosis, comparing a model based on UIP vs non-UIP pattern labels by expert readers; versus a model based on TM imaging features adjusting for clinical variables. The Kaplan-Meier curves based on TM classification model (window size = 8mm) as well as labels by experts are shown in Figure 6. The QCT imaging features with clinical covariates survival model resulted in statistically significant P-value < 0.03 along with higher C-statistics (0.73) compared to the model solely based on UIP vs non-UIP expert labels (P-value = 0.59, not statistically significant).

The rationale for applying our TM algorithms is that simple HM algorithms that were developed to identify basic CT imaging patterns of ILD may fail to identify complex and overlapping patterns of interstitial fibrosis and inflammation. It is postulated that higher order statistics derived from TM models, which specifically and more fully characterize complex imaging textures, may allow further information to be extracted from HRCT data in patients with fibrotic ILD. Our results support that hypothesis, demonstrating the statistically significant superiority of TM algorithms for classification of ILD. Furthermore, we demonstrated the value of clinical information to improve the accuracy of any classification model based on QCT metrics, as both HM and TM models improved their classification performance when adjusting for clinical variables. It is also important to emphasize that there is an optimal window (ROI) size, which in our analysis was 8 mm. Our interpretation is that, on the one hand, smaller window sizes (e.g. < 6 mm) are more susceptible to noise due to different imaging reconstruction parameters and CT scanning technique and may not capture spatial features that can only be detected over larger areas. On the other hand, larger window sizes (e.g. > 12 mm) may cause averaging of fine interstitial abnormalities and fail to capture complex texture patterns. Therefore, there is likely an optimal window size threshold, and we found out that it is approximately 8 mm.

Our second hypothesis relied on analyzing patient outcome data (time to death, time to lung transplantation, time to ICU admission with respiratory failure) via Cox survival analysis to assess how well our TM method combining both lower level and higher level texture features (such as honeycombing, reticulation, hyperlucent, consolidation, ground-glass, normal) predicts outcomes when compared to expert consensus diagnosis (UIP versus non-UIP). The TM approach could separate two distinct survival curves (P-value < 0.05) while expert labeling (UIP vs non-UIP patterns) resulted in no statistically significant difference. This is of utmost clinical relevance, as it not only showcases the limitations of the current ILD classification scheme, but also offers a new and more nuanced paradigm to assess the extent of structural changes in the lungs that can be captured by QCT, but is not formally incorporated into existing classification schemes. Improved prognostication may allow more aggressive management of patients that are deemed more likely to clinically deteriorate, at the same time that it may obviate potentially risky procedures such as surgical biopsies in patients that are more likely to remain stable or even improve over time.

Our study has several limitations. First, it included a relatively small sample size, due to resource constraints. However, results showed that even with the relatively small sample size, radiomics can be useful for ILD prognosis and diagnosis given the complex nature of the disease. Notwithstanding our small sample size, it is crucial to emphasize that our results are statistically significant, demonstrating that the effects we have found are large enough to pass statistical significance tests even in this small sample size, and therefore deserve further investigation. In the future, we plan to extend our analysis to a much larger cohort of ILD patients. Second, automated segmentation results required minor manual expert correction in circa 25% of patients, especially in advanced ILD with noisy CT images, supporting the need for further technical development. This can be improved with implementing new deep learning-based segmentation approaches. Nonetheless, human revision and correction can help us to overcome this technical limitation in our study. Third, not all patients had pathology confirmation of diagnosis (though this is the usual clinical practice, as biopsies are not performed in high risk patients or patients with definite UIP pattern on HRCT). Last, the retrospective study design introduces variability in imaging acquisition protocols and clinical management (though this suggests that our method is robust enough to be able to measure a signal amidst the statistical noise from real world, non-uniform HRCT datasets, in standard clinical practice).

In summary, our results suggest that QCT lung parenchymal texture biomarkers derived from volumetric HRCT data may improve diagnosis, particularly by non-thoracic radiology experts, and even more importantly, may allow better prognostication of patients with ILD, ultimately contributing to better patient care, personalized management and treatment planning, and possibly improved long-term outcomes.

Acknowledgements:

The authors would like to acknowledge the contributions of the following collaborators:

Lauren Pantalone, BS, for collecting clinical information

Maya A. Galperin, MD, for analyzing HRCT scans with the lead author

Funding:

This research was supported by a Radiological Society of North America (RSNA) Research Seed Grant to the lead author (EB), # RSD1715. RSNA approved the study design as described in the grant proposal, however, it did not have any role in data collection, data analysis and manuscript writing.

Competing Interests

There is no conflict of interest for all authors including:

Babak Haghighi, Warren B. Gefter, Lauren Pantalone, Despina Kontos, Eduardo J. Mortani Barbosa Jr.

Hemingway, H. et al. Prognosis research strategy (PROGRESS) 1: a framework for researching clinical outcomes. BMJ. 346, 5595 (2013).
Rothwell, P. M. Prognostic models. Pract Neurol. 8 (4), 242–253 (2008).
William, D. et al. An Official American Thoracic Society/European Respiratory Society Statement: Update of the International Multidisciplinary Classification of the Idiopathic Interstitial Pneumonias. Am J Respir Crit Care Med. Vol 188 (Iss. 6), 733–748 (2013).
Lynch, D. A. et al. High-resolution computed tomography in idiopathic pulmonary fibrosis: diagnosis and prognosis. Am J Respir Crit Care Med. 172, 488–493 (2005).
Travis, W. D. et al. Idiopathic nonspecific interstitial pneumonia: prognostic significance of cellular and fibrosing patterns: survival comparison with usual interstitial pneumonia and desquamative interstitial pneumonia. Am J Surg Pathol. 24 (1), 19–33 (2000).
Lynch, D. A. et al. Idiopathic interstitial pneumonias: CT features. Radiology. Jul;236 (1), 10–21 (2005).
Collard, H. R. & King, T. E. Jr Demystifying idiopathic interstitial pneumonia. Arch Intern Med. 163, 17–29 (2003).
Collins, C. D. et al. Observer variation in pattern type and extent of disease in fibrosing alveolitis on thin section computed tomography and chest radiography. Clin Radiol. 49, 236–240 (1994).
Camiciottoli, G. et al. Lung CT densitometry in systemic sclerosis, correlation with lung function, exercise test and quality of life. Chest. 131, 672–681 (2007).
Tafti, S. F. et al. Comparison of clinicoradiologic manifestation of nonspecific interstitial pneumonia and usual interstitial pneumonia/idiopathic pulmonary fibrosis: a report from NRITLD. Ann Thorac Med. 3 (4), 140–145 (2008).
Akashi, T. et al. Histopathologic analysis of sixteen autopsy cases of chronic hypersensitivity pneumonitis and comparison with idiopathic pulmonary fibrosis/usual interstitial pneumonia. Am J Clin Pathol. 131 (3), 405–415 (2009).
Best, A. C. et al. Quantitative CT indexes in idiopathic pulmonary fibrosis: relationship with physiologic impairment. Radiology. 228 (2), 407–414 (2003).
Yoon, R. et al. Quantitative assessment of change in regional disease patterns on serial HRCT of fibrotic interstitial pneumonia with texture-based automated quantification system. European Radiology. 23 (3), 692–701 (2013).
Park, S. O. et al. Comparison of usual interstitial pneumonia and nonspecific interstitial pneumonia: quantification of disease severity and discrimination between two diseases on HRCT using a texture-based automated system. Korean J Radiol. 12 (3), 297–307 (2011).
Depeursinge, A. et al. Comparative performance analysis of state-of-the-art classification algorithms applied to lung tissue categorization. J Digit Imaging. 23 (1), 18–30 (2010).
Huber, M. B. et al. Performance of topological texture features to classify fibrotic interstitial lung disease patterns. Med Phys. 38 (4), 2035–2044 (2011).
Hoffman, E. A. et al. Characterization of the interstitial lung diseases via density-based and texture-based analysis of computed tomography images of lung structure and function. Acad Radiol. 10 (10), 1104–1018 (2003).
Rosas, I. O. et al. Automated quantification of high-resolution CT scan findings in individuals at risk for pulmonary fibrosis. Chest. 140 (6), 1590–1597 (2011).
Boehm, H. F. et al. Automated classification of normal and pathologic pulmonary tissue by topological texture features extracted from multi-detector CT in 3D. Eur Radiol. 18 (12), 2745–2755 (2008).
Song, G. et al. A Comparative Study of HRCT Image Metrics and PFT Values for Characterization of ILD and COPD. Academic Radiology. 19 (7), 857–864 (2012).
Barbosa, E. M. Jr et al. Computational analysis of thoracic multidetector row HRCT for segmentation and quantification of small airway air trapping and emphysema in obstructive pulmonary disease. Acad Radiol. 18 (10), 1258–1269 (2011).
Bartholmai, B. J. et al. Quantitative CT Imaging of Interstitial Lung Diseases. Journal of Thoracic Imaging. 28 (5), 298–307 (2013).
Xu, Y. et al. Computer-aided classification of interstitial lung diseases via MDCT: 3D adaptive multiple feature method (3D AMFM). Acad Radiol. 13 (8), 969–978 (2006).
Kim, H. J. et al. Classification of parenchymal abnormality in scleroderma lung using a novel approach to denoise images collected via a multicenter study. Acad Radiol. 15 (8), 1004–1016 (2008).
Zheng, Y. et al. Parenchymal texture analysis in digital mammography: A fully automated pipeline for breast cancer risk assessment. Med Phys. 42, 4149–4160 (2015).
Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B. & Wei, L. J. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 30, 1105–1117 (2011).
Bozdogan, H. Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions. Psychometrika. 52, 345–370 (1987).

UIP: usual interstitial pneumonia

NSIP: nonspecific interstitial pneumonia

CHP: chronic hypersensitivity pneumonitis

HM: histogram-based method

TM: texture-based method

QCT: quantitative computed tomography

HRCT: high resolution computed tomography

ILD: fibrosing interstitial lung diseases

CVs: clinical variables

ROI: region of interest

Table 1: Patient demographics, clinical characteristics, and outcome data

	non-UIP pattern	UIP pattern
Demography and Clinical characteristics
	N = 20	N = 20
Age (years)	60.0 (11.2)	60.9 (8.4)
Gender (Female/Male %)	45.8/54.2	37.5/62.5
Disease Severity (Mild, Moderate, Severe %)	33.3/50/16.7	37.5/37.5/25
Emphysema (no/yes %)	83.3/16.7	68.8/31.2
Outcome data
Censored/ Deceased (%)	66.7/33.3	37.5/62.5
Biopsy (no/yes %)	54.2/45.8	62.5/37.5

Values expressed as mean (SD) or number (%).

Table 2: AUCs for classification using TM with/without clinical data

	W = 4 mm	W = 6 mm	W = 8 mm	W = 10 mm	W = 12 mm	W = 14 mm	W = 16 mm	W = 18 mm	W = 20 mm
*AUC without clinical measures*
	0.669	0.698	0.693	0.656	0.656	0.591	0.536	0.568	0.594
*AUC with clinical measures*
	0.695	0.732	0.745	0.682	0.646	0.677	0.544	0.568	0.594

No competing interests reported.

Supplementary.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Quantitative CT texture-based method to predict diagnosis and prognosis of fibrosing interstitial lung disease patterns

Status:

Version 1

Abstract

Objectives

Methods

Results

Conclusion

Figures

Introduction

Materials And Methods

Results

Discussion

Declarations

References

Abbreviations List

Tables

Additional Declarations

Supplementary Files

Status:

Version 1