Radiomic Phenotyping of the Lung Parenchyma: Feasibility Study in a Screening Cohort

Rationale: High-throughput extraction of radiomic features from low-dose CT scans can characterize the heterogeneity of the lung parenchyma and potentially aid in identifying subpopulations that may have higher risk of lung diseases, such as COPD, and lung cancer due to inammation or obstruction of the airways. We aim to determine the feasibility a lung radiomics phenotyping approach in a lung cancer screening cohort, while quantifying the effect of different CT reconstruction algorithms on phenotype robustness. showed two statistically signicant phenotypes (P < 0.05) with signicant difference for BMI and smoking pack-years. Radiomic analysis can be used to characterize lung parenchymal phenotypes from low-dose CT scans, which appear reproducible for different reconstruction kernels. Further work should seek to evaluate the effect of additional CT acquisition parameters and validate these phenotypes in characterizing lung cancer screening populations, to potentially better stratify disease patterns and cancer risk.


Introduction
Quantitative CT (QCT) imaging-based metrics, including radiomic features, can be an important tool for phenotyping lung diseases, such as COPD and interstitial lung diseases (1), and potentially also assessing the risk for developing lung cancer (2). For example, Raghu et al. (3) proposed an improved model for early prediction of lung cancer from clinical, demographic and low-dose CT (LDCT) data within a lung cancer screening cohort. In addition, Hawkins et al. (4) showed the radiomics of lung cancer screening LDCT at baseline can be used to assess risk of development of cancer. Castaldi et al. (5) identi ed four subgroups of smokers within the COPDGene cohort with unique clinical characteristics and COPD-associated genetic variants. Recently, Haghighi et al. (6) used a QCT imaging-based clustering approach to identify homogeneous clusters within current smokers with unique clinical phenotype characteristics.
The breadth of a radiomics-based approach could offer unique advantages in characterizing the heterogeneity of the lung parenchyma as an imaging biomarker of disease severity and/or the risk of developing lung cancer. Image acquisition and reconstruction can vary widely across different scanners, causing unwanted variation in extracted radiomic features (7), which can be a challenge in standardizing and translating such imaging biomarkers. Sha q-ul-Hassan et al. (7) investigated the reconstruction kernel-induced variability using noise power spectra as a correction factor to reduce variability in CT texture features. Zhao et al. (8) assess a comprehensive, commonly-used set of radiomic features from lung cancer patients and show that radiomic features can be reproducible over a wide range of imaging parameters, but smooth and sharp reconstruction algorithms can induce variability in radiomic features. Meyer et al. (9) have also shown that most radiomic features are highly affected by CT acquisition and reconstruction, resulting in non-reproducible features in liver lesions.
In this study we aim to establish the feasibility of a radiomics approach for characterizing intrinsic lung parenchymal patterns. Our main hypotheses are that lung cancer screening LDCT contain enough latent structural and functional information such that a set of comprehensive radiomic features can assess the intrinsic heterogeneity of the lung parenchyma, and that these phenotypes can be inherently robust to common CT acquisition parameters such as reconstruction kernels. Our long-term hypothesis is that these radiomic phenotypes can serve as precursors of lung diseases as well as to characterize the extent of such diseases and to identify patients at higher risk of developing lung cancer.

Human Data
The multicenter National Cancer Institute (NCI) Population-based Research to Optimize the Screening Process (PROSPR) lung cancer screening consortium (10) aims to address disparities in lung cancer mortality through research on the receipt and effectiveness of lung cancer screening within and across diverse healthcare systems and patient populations. Our study was designed as a single-institution feasibility study within the NCI PROSPR-Lungconsortium. This study design was approved by the institutional review board (IRB) of the University of Pennsylvania. Patient data was fully anonymized and adequate precautions were undertaken to ensure protection of patient privacy and con dentiality.

CT Acquisition Parameters
We obtained LDCT scans (n = 308) acquired with Siemens Healthineers scanners from patients undergoing routine lung cancer screening at our institution between 2015-2018, that had two different sets of image reconstruction kernels available (i.e., medium (I30f), sharp (I50f)) for the same acquisition (Two-kernel data set). Within the same institutional lung cancer screening cohort, we also identi ed an independent sample of patients screened who also had Pulmonary Function Test (PFT) data and COPD obstruction information available in their clinical record (n = 88) (PFT data set). Additional available clinical covariates for all patients included in our study were age, BMI, sex, Lung-RADs (11), smoking status, smoking pack-years and cancer diagnosis (i.e., biopsy con rmed cancer cases). Lung-RADs categories were collapsed into two groups based on scan ndings: Group A: negative scan (Lung-RADS 1 and 2) and Group B: positive scan (Lung-RADs 3/4A/4B/4X). The demographics and clinical information of the two independent data sets are summarized in Tables 1 and 2, respectively.

Radiomic phenotyping of the Lung Parenchyma
The lung eld in all LDCT images for these two datasets was segmented. Our segmentation method is an automated 3-dimensional, intensity-based algorithm using K-means clustering to properly determine cluster centers of air / lung tissue versus soft tissue attenuation. The lattice-based texture feature extraction pipeline (12) was applied to extract 26 three dimensional (3D) radiomic features from three major statistical approaches, gray-level histogram, co-occurrence, and run-length descriptors. Brie y, greylevel histogram features are rst-order statistics describing the distribution of gray-level intensities. Cooccurrence features consider the spatial relationships of pixel intensities in different directions and are based on the gray-level co-occurrence matrix that encodes the relative frequency of neighboring intensity values. Run-length features capture the coarseness of texture in speci ed directions by measuring strings of consecutive pixels that have the same gray-level intensity along speci c linear orientations (please see the supplementary section for feature de nitions). Different window sizes (W) from 4mm to 20mm were used to assess texture information at different spatial scales at each lattice point with an intent to evaluate different spatial levels of texture alterations. Furthermore, for each window size W, measures from lattice points were averaged over each 3D feature map to create a per-patient measure for each feature. This resulted in a feature vector of 26 features characterizing parenchymal complexity for each patient. The overall feature extraction pipeline is shown in Figure 1 and the schematic of the lattice approach is depicted in Figure 2.

Feature Harmonization
The extracted imaging feature vectors from both datasets (two-kernel and PFT) were harmonized to correct for differences in imaging parameters using ComBat (13). ComBat is a harmonization method originally developed for genomic datasets that can address and correct variation in imaging features due to heterogeneity in imaging parameters -such as reconstruction kernel -by assuming a location and spread variation in the distribution of each feature due to the imaging parameter value, using an empirical Bayes approach to estimate these location and spread parameters, and inverse transforming the values by these estimated parameters to harmonize the data. When it is expected that other nonfeature covariates will affect the feature values -for instance, if we expect the features to be associated with sex -these covariates can be speci ed as protected variables in the ComBat procedure, and variation due to these covariates will be preserved in the ComBat harmonization. Before applying ComBat, outlying texture feature values were identi ed, as previous work had shown that excluding outliers improved the effectiveness of ComBat harmonization. To identify outliers, in each dataset, each feature was residualized as the dependent variable in each of three univariable linear regression models, with age at index scan, BMI, and smoking pack-years as the predictors. Any image in which any of the three residuals, for any feature, was outside the range of the median ± 2.5 × IQR was tagged as an outlier. ComBat was used to harmonize features from the datasets both with and without the outliers dropped, using the Python neurocomBat package, with kernel as the batch effect for the two-kernel dataset and manufacturer as the batch effect for the PFT dataset. Sex, Lung-RADS score, smoking status, age, BMI, and smoking pack-years were protected variables in the ComBat harmonization.

Clustering and Statistical Analysis
With the extracted feature vectors for each patient, and for each window size W, an unsupervised hierarchical clustering approach was applied to the feature vectors extracted from each scan, and separately for each of the two reconstruction kernels, to group patients that share similar lunch parenchymal patterns. Therefore, the clusters of patients were derived for each reconstruction kernel. Consensus clustering was used to nd the optimal number of clusters for each reconstruction kernel (14). Entanglement parameters (15) showing the quality of the alignment between different trees of hierarchical clustering from the two kernels were computed (Figures S2). Entanglement is a measure between 1 (full entanglement) and 0 (no entanglement), where a lower entanglement coe cient corresponds to a better alignment between the clustering dendrogram structures.

Phenotype Associations with Demographics and PFT data
We evaluated associations between the identi ed radiomic lung parenchymal phenotype clusters with the available demographic and clinical co-variates. The Kruskal-Wallis and chi-square tests were used to assess differences from continuous and categorical variables, respectively, across phenotypes where P value= 0.05 was used as the threshold for determining signi cance in all tests. All data analysis was performed using the software R (version 3.1.1).
To assess the degree of reproducibility (validation) of cluster characteristics from one data set to another, the derived clusters in one data set (i.e., the two-reconstruction kernel dataset) were mapped to another data set domain (i.e., the independent PFT dataset). First, the centroids of two clusters from the Kernel data set were calculated. Then, the mapped clusters were assigned in the PFT data set by assigning each patient to the closest cluster centroid learned by the hierarchical clustering algorithm in the Kernel data set (please see supplementary section for details).

Feature Harmonization
Differences in feature distributions between kernel groups in the raw features and ComBat-harmonized features were assessed with the Kolmogorov-Smirnov (KS) test at a P value signi cance level of 0.05. KS testing on the feature distributions of the two-reconstruction kernel dataset prior to harmonization demonstrated statistically signi cant differences between features for kernel groups with different window sizes. Also the number of features with statistically signi cant differences decreased when residual outliers were dropped after harmonizing by ComBat (Table S1).
Similarly, for the PFT dataset features showed statistically signi cant differences between manufacturer groups prior to harmonization for different window sizes. After applying ComBat and outlier dropping, the number of features with statistically signi cant differences decreased (Table S2).

Two Clusters and Imaging-based Characteristics
A consensus clustering approach was applied to the harmonized imaging feature vectors, and the number of clusters K = 2 with a signi cant difference (P < 0.05) was selected as the optimal number based on their consensus matrices, which was consistent across the different window sizes (W) used for feature extraction (see supplementary materials for details). The optimal number of clusters was calculated using consensus clustering (Figures S1).
Heatmaps generated by radiomic features identi ed with two distinct parenchymal phenotype patterns across different window sizes for both reconstruction algorithms are depicted in Figure 3 and Figure 4, consistently showing two statistically signi cant phenotypes across both reconstruction kernels and all window sizes (P < 0.05). The number of patients differed slightly between kernels and windows sizes. The clustering results are tabulated in Table 3.
Entanglement parameters assessing the degree of similarity between clustering dendrograms were calculated to be 0.26, 0.16, 0.9 for W= 4, 8, and 20mm, respectively ( Figure S2). The smaller entanglement parameters indicate that the two clusters for the two different kernels are more like each other when using W= 4 and 8mm as compared to 20mm. Furthermore, the mapped PFT dataset also produced two statistically signi cant clusters (P <0.05) for different kernels and window sizes (Table 4).

Clusters Associations with Demographic and PFT data
Association between clusters (phenotypes) for the two reconstruction kernels with demographics is tabulated in Table 3 for the different lattice window sizes (W = 4, 8 and 20mm). BMI showed a signi cant difference between clusters (P < 0.05), consistently across all window sizes. Furthermore, smoking packyears showed reaching the signi cant level for W= 8 and 20mm. Cancer diagnosis showed signi cant difference for two clusters for smaller window sizes W = 4 and 8mm while this difference was not signi cant for W= 20mm (P > 0.05). Lung-RADs did not show any signi cant difference across the two phenotype clusters for any window size.
To assess reproducibility of the clustering approaches between the two datasets, the mapped clusters from the two-kernel data set to the PFT data set were assessed. Table 4 shows the associations between the clinical covariates for PFT dataset and their corresponding mapped clusters for different window sizes. The similar clustering approaches for the PFT data also showed two statistically signi cant phenotypes (P < 0.05). Association with the available PFT and clinical covariates for the different window sizes is shown in Table 4. While airway obstruction did not show signi cant difference, BMI and smoking pack-years demonstrated signi cant differences across clusters for all window sizes.

Discussion
Lung diseases, such as history of emphysema, chronic bronchitis, pneumonia and tuberculosis, are shown to in uence lung cancer risk, independently of tobacco use (16). One of the related hypotheses is that such diseases, which obstruct the air ow in the lung airways, are sources of in ammation in the lung tissue and may act as a catalyst in the development of lung neoplasms. Clinically established assessment exists for evaluating the extent of such diseases, including PFT, which evaluates degree of pulmonary impairment for example after respiratory infections, chronic bronchitis and can assess the severity of emphysema and COPD. However, most of these diseases are shown to be heterogeneous, both across and within patients, and such measures may be limited in capturing the extend of the in ammation and obstruction on the lung tissue, likely associated with differential cancer risk (17). LDCT offers a unique opportunity to characterize the heterogeneity of lung diseases conferring increased lung cancer risk using re ned quantitative imaging measures (18). Thus, the main premise of our study is that radiomic imaging features can aid in characterizing phenotypes of lung parenchymal heterogeneity from LDCT which may relate to underlying biologic heterogeneity of the overall lung structure that may increase lung cancer risk.
We applied unsupervised hierarchical clustering to a comprehensive set of LDCT radiomic features to establish feasibility of deriving intrinsic lung parenchymal phenotypes in a lung screening cohort, and further evaluated their reproducibility across different reconstruction kernels and feature extraction parameters such as window size. Furthermore, we applied our approach to an independent dataset with PFT information to assess the ability of our phenotypic approach in distinguishing patients with the same imaging and clinical characteristics. We found that these phenotypes were reproducible, and relatively robust across settings. This can be helpful when dealing with heterogeneous CT image data from with different acquisition parameters.
Cluster phenotype assignments were dependent on variation in LDCT reconstruction kernels and feature extraction parameters. The degree of similarity between clusters was evaluated using the entanglement parameter and the windows sizes with the smaller entanglement were considered as the optimal region of interest sizes in radiomic feature extraction process for decreasing phenotype sensitivity to variation of LDCT parameters. The phenotypes were also found to be reproducible in an independent dataset. For this study, window sizes of W=4mm and 8mm showed the lowest sensitivity to CT reconstruction parameters. This implies that window size parameter showing the degree of information that can be extracted from images at different spatial scales is important in evaluating different levels of lung texture alterations. The lower entanglement parameters, along with the fact that a smaller window size can potentially allow for extracting more re ned image texture information, suggests that smaller widow sizes (W=4mm and 8mm) can be the optimal for radiomic feature extraction and the corresponding phenotypes.
The two phenotypes across the different kernel and windows settings showed signi cance difference related with BMI. This could be because higher BMI (obesity) can contribute to higher levels of systemic in ammation, diabetes, and worse prognosis in many infectious conditions, as suggested by Sood et al. (19) and Joppa et al. (20), which can in turn affect our detected cluster characteristics. Interestingly, and considering that lung tissue in ammation may be a risk factor for the development of lung cancer, our results also showed signi cance difference for cancer diagnosis between the two clusters, which was a consistent observation across kernels. These two clusters also had signi cantly different smoking packyears for W=4mm which is another well-stablished risk factor for lung cancer diagnosis (22). Thus, together this data suggests that radiomic phenotypes may represent intrinsic lung parenchymal characteristics that may re ect underlying biologic underpinnings of the lung tissue predisposition to lung cancer, and may ultimately have value in augmenting risk assessment. Better identi cation of patients at high risk from lung cancer continues to be very important when prioritizing the best candidates for inclusion in lung cancer screening programs.
When applying our phenotyping approach using cluster mapping on the independent screening LDCT data set with PFT information, the degree of lung obstruction as measured by PFT did not show statistically signi cant differences, suggesting that radiomic phenotyping may capture complementary information to the current gold standard. However, smoking pack-years showed signi cant difference across the two phenotypes for smaller window sizes (W=4mm and 8mm). Nevertheless, cancer diagnosis and Lung-RADs did not reach signi cance, which may be due to relatively small number of cancer cases in this lung screening cohort.
Our study has several limitations. First, our work focused on mainly evaluating the effect of LDCT reconstruction kernel parameter on the extracted phenotypes, which is only one factor of the LDCT acquisition. Future analyses should also consider additional LDCT parameters such as dose, image resolution, and slice thickness (24). Second, for better assessment of phenotype stability over time, our analysis can be extended to available longitudinal data. Third, our study included a relatively small sample con rmed cancer cases, therefore future larger studies are needed, with additional clinical information such as history of pulmonary/vascular conditions, asthma, emphysema, and shortness of breath, and more extensive lung obstruction data, including the Fleischner Society emphysema grading system (25) in order to expand these analyses in more heterogeneous LDCT datasets and further evaluate potential associations between such possible phenotypes, lung diseases, and the risk for developing lung cancer.
In conclusion, our study demonstrated feasibility of leveraging a radiomics-based approach to identify potentially intrinsic phenotypes of lung parenchymal patterns in LDCT screening scans. We showed that such phenotypes are reproducible in an independent dataset, and are relatively robust when considering variations in LDCT reconstruction kernel and the resolution/scale of the radiomics feature extraction approach. We also demonstrated a signi cant association with these phenotypes and BMI and cancer diagnosis, which could represent a phenotypic manifestation of in ammation to the lung parenchymal structure.

Ethics Approval and Consent to Participate
Ethics and consent were approved by PROSPR committee.

Consent for Publication
The paper was approved by PROSPR publications and presentation committee.

Availability of Data and Material
Not applicable

Competing Interests
There is no con ict of interest for all authors including: Funding/Support This work was made possible by the Abramson Cancer Center Population Science Center of Excellence, which is funded by NCI P30 CA016520 and by Institutional Funds, and also NIH/NCI 5UM1CA221939, "center for research to optimize precision lung cancer screening in diverse populations".     Figure 1 The schematic of the LDCT analysis pipeline including lung eld segmentation, radiomic feature extraction, and unsupervised hierarchical clustering for phenotype generation.