The Value of CT Radiomics Nomogram Analysis of Parenchyma Surrounding Pulmonary Nodules in the Differentiation of Inammatory Nodules from Tuberculous Nodules

Background: To investigate the value of CT Radiomics nomogram based on pulmonary nodules and surrounding parenchyma in differentiating non-tuberculous inammatory pulmonary nodules from tuberculous nodules. Methods: A retrospective analysis was performed on 273 patients with pulmonarynodules conrmed by surgery and pathology in the Second Aliated Hospital of Army Military Medical University from January 2015 to March 2021, including 164 cases of non-tuberculous inammatory nodules and 109 cases of tuberculous nodules.Pulmonary nodule (ROI1), 3mm parenchymal band around nodule (ROI2) and nodules with external expansion (ROI3) were segmented andradiomic features were extracted by A.K. software. The stability of the features was analyzed by ICC, and the features were divided into the training set and the verication set by random stratied sampling according to the ratio of 7:3. The radiomics label was constructed by using the maximum relevance and minimum redundancy method (mRMR) and the least absolute shrinkage and selection operator method (LASSO) after two dimensional reduction.Finally, multiple Logistic regression analysis was conducted to select the optimal model among the three models, and the classication model including the radiomicfeatures and clinical risk factors was established.The identication effectiveness of the model was evaluated in terms of average area under curve (AUC), accuracy, sensitivity, and specicity, and was independently validated in a validation cohort. Results: There was signicant difference in the type of pulmonary nodules between the two groups (p< 0.05). Nodules with external expansion model has the best diagnostic performance, and combined with clinical features (nodular type) to construct a classication model.The AUC of both the training cohort and the validation cohort was 0.93. Decision curve analysis showed that the radiomics nomogram had good value in classication of the two categories of nodules. Conclusions: The CT radiomics nomogram based on nodule plus surrounding substance has good performance in judging non-tuberculous inammatory nodules and tuberculous nodules, which can provide reference for clinical diagnosis and treatment, and has good clinical application value and development prospect.


Background
The wide application of low-dose chest CT and arti cial intelligence assisted diagnosis of pulmonary nodules has signi cantly improved the detection rate of pulmonary nodules [1,2] . The nature of detected nodules can be either benign lesions such as in ammation and benign tumors, or malignant lesions such as lung cancer and metastatic tumors. Therefore, that can also cause a serious impact on the accurate diagnosis of clinicians and the psychological pressure of patients [3] . Due to the different nature of pulmonary nodules, their treatment options are different: malignant nodules usually require surgical resection; non-tuberculous in ammatory nodules require routine anti-in ammatory treatment; tuberculous nodules should be treated with anti-tuberculous drugs. Therefore, precise diagnosis of pulmonary nodules can effectively guide clinical treatment and avoid the waste of medical resources caused by excessive medical treatment. Previous studies mostly focused on the diagnosis of benign and malignant pulmonary nodules, while there were few studies on the differentiation between non-tuberculous pulmonary nodules and tuberculous nodules. As a special type of in ammatory nodules, tuberculous nodules have different pathological basis from non-tuberculous in ammatory nodules [4] . Although, these two types of pulmonary nodules are di cult to distinguish by conventional CT diagnosis, but radiomics can be used to identify them by extracting information about the nodules and surrounding parenchyma. Radiomics is an emerging technology based on big data technology combined with computer-aided diagnosis (CAD) [5] . By mining and extracting a large amount of information invisible to the naked eye but of clinical value from medical images, it can provide pathological classi cation or predict the prognosis of diseases. In this study, two types of pulmonary nodules and their surrounding parenchyma were extracted with high-throughput features by Radiomics. Through the establishment of multiple classi cation models and the selection of the best diagnostic e ciency of the model to build the radiomic nomogram, which in order to improve the accuracy of pneumonia nodal classi cation diagnosis, and provide the basis for clinical precision treatment.

Subject
This prospective study was approved by the Medical Ethics Committee of Second A liated Hospital of Army Medical University (2020-147-01). All subjects were exempted from informed consent, Moreover, all methods were implemented in accordance with the approved regulations and the Declaration of Helsinki.
Clinical data and Lung Low-dose computed tomography (LDCT) imaging data of patients undergoing pulmonary nodules surgery from January 2015 to March 2021 were retrospectively collected. Inclusion criteria: (1) Patients were proved to be non-tuberculous in ammatory nodules or tuberculous nodules by operation and pathology. (2) the Lung LDCT was performed preoperatively, and showed no signi cant signs of pneumonia or other obvious signs of tuberculosis. (3) No invasive procedures such as puncture were performed before LDCT examination. (4) Pulmonary nodules have no radiographic features such as calci cation and cavity of typical benign nodules. Exclusion criteria: (1) LDCT was not performed in our hospital preoperative. (2) LDCT images showed other lesions or artifacts such as respiratory movements.
(3) Patients with a previous history of tuberculosis. (4) Patients who received treatment such as radiation or chemotherapy prior to LDCT examination. During this period, a total of 2,768 patients underwent surgery for pulmonary nodules, among which 2,323 patients were pathologically con rmed to be malignant pulmonary nodules, and the rest were benign nodules, including tuberculosis, chronic in ammation, cryptococcosis and pulmonary hamartoma. In ammatory pulmonary nodules account for a large proportion of surgical treatment. In the end, 109 cases of tuberculous nodules and 164 cases of non-tuberculous in ammatory nodules were enrolled in this study.
Preoperative clinic data and laboratory data of patients were collected on hospital HIS system. General clinical data include gender and age; In ammatory markers include white blood cell count, NEUT%, LYM%, MXD%, EO%, BASO%, C-reactive protein and erythrocyte sedimentation rate ; Indicators of TB infection   include PPD test and T-spot test; Tumor markers include CEA, CA153, NSE, SCC, cyfra21-1, SF, CA125 and ProGRP. The nodule type information of patients was collected by Picture Archiving and Communication Systems (PACS). Types of nodules include pure ground glass nodules (pGGN), mixed ground glass nodules (mGGN), and solid nodules (SN). This was determined with reference to the data annotation of pulmonary nodules and the consensus of quality control experts [6,7] .

Data acquisition
All subjects underwent lung LDCT examination using GE Optima CT660 (GE Healthcare, USA), LDCT Scanning parameters are as follows: tube voltage:100Kv; automatic tube current modulation; eld of view: 500mm; detector collimation: 0.625mm; layer thickness and spacing: 0.625mm; gathering matrix: 512×512; pitch: 1.375. All imaging data were reconstructed without interval by using high resolution and standard lung algorithms, and the thickness of the reconstruction layer was 5mm. All clinical data of CT images were desensitized and exported in DICOM format by PACS.

Image preprocessing and ROI segmentation
The axial LDCT images with 0.625mm layer thickness of each subject were imported into the A.K (Arti cial Intelligent Kit, version 3.3.0, GE Healthcare) research platform. The original image was preprocessed as follows: Firstly, the spatial resolution of the image was adjusted to 1 mm×1 mm × 1 mm by resampling the CT image. Then the CT images were standardization with the grayscale uni ed adjusted to 0 ~ 255. The nal image after preprocessing, reference lung nodules data labeling and quality control expert consensus [6] , by two radiologists (7 years and 9 years of experience in chest CT imaging diagnostic) for nodules semiautomatic segmentation and manual correction nodule boundary, by labeling leader and arbitration experts (both for More than 15 years of experience in diagnosis of senior professional doctor) for review and modi cation. The ROIs of pulmonary nodules was nally determined. ROI1 was segmented the pulmonary nodules, ROI2 was expanded by 3mm on the basis of nodular segmentation, ROI3 was the total of pulmonary nodules plus the external (Fig. 1A-F).

Feature extraction
Firstly, we imported all the two groups of preprocessed axial LDCT images. Then import all corresponding three ROIs in batches. We selected total 851 radiomics features of each segmented pulmonary nodules,  [9] . The results of two-dimensional wavelet decomposition re ect the frequency changes in different directions and the texture features of the image. All features were extracted using A.K software.

Feature selection
Before the feature selection, Intraclass Correlation Coe cient (ICC) was used to check the consistency of the radiomic features extracted by two surveyors to ensure the stability of the extracted radiomic features. ICC average > 0.75 was selected for subsequent analysis [10] . Then all subjects were divided into training set and veri cation set by random strati ed sampling at a ratio of 7:3. Finally, the unit limitation for each column of features was eliminated by normalization. Two feature selection methods to select the radiomic features among three models of each ROI in two groups. At rst, the maximum correlation and minimum redundancy (mRMR) method was used to reduce the dimension of stable radiomic features to eliminate redundant and irrelevant features, that is, to maximize the correlation between features and classi cation variables. Then, feature dimensionality is reduced again by Least Absolute Shrinkage and Selection Operator (LASSO), and the best parameters are obtained through cross veri cation. Lambda is used to carry out nal feature Selection based on the principle of minimum error. At the same time, 10-fold cross validation is carried out to select the optimal subset of the best features to build the nal three models.

Comprehensive radiomics signature construction and validation
Logistic regression analysis was performed on the selected radiomics features to establish the classi cation and diagnosis model for non-tuberculous in ammatory nodules and tuberculous nodules by ROI1, ROI2 and ROI3, respectively. The test data was analyzed by ROC to independently validate the performance of each diagnostic model, then choose the best diagnostic model. A Nomogram was established using the RMS software package in R language to evaluate the risk of pulmonary nodules being tuberculous. Based on the decision curve, the corresponding net bene t was calculated under different threshold probabilities to evaluate its value in clinical application, and the model tting degree was tested by Hosmer-Lemeshow [11] .

Statistical analysis
SPSS (version 22.0, IBM) was used for statistical analysis of the general data and laboratory indicators of the two groups of patients. Qualitative data were expressed as frequency and χ2 test was used. For the measurement data following normal distribution, the mean ± standard deviation was used to express, and the independent sample t test was used. For measurement data that do not obey the normal distribution, the median (upper and lower quaternary) is used to represent, and the non-parametric Mann-Whitney U test is used. In addition, image omics feature selection, ROC curve drawing, model construction and veri cation were all performed using R software (version 3.6.0; http://www.Rproject.org); All statistics were double-tailed analysis, and P < 0.05 was considered statistically signi cant.

Demographic and neurocognitive
Clinical data of 263 cases of patients with general description as shown in table 1, The nodule type between two groups was statistically signi cant difference (p < 0.05). There was no difference between general data and laboratory data in the other two groups (P > 0.05). The tumor markers between the two groups were no different and they were not listed because the study was aim to analyze the benign pulmonary nodules. Due to the laboratory data on CRP, erythrocyte sedimentation rate, and tuberculosis indicators were not tested in most patients, comparative analysis was not performed. # χ2 test was used for gender and nodule type; * Nonparametric Mann-Whitney U test was used for data that do not follow normal distribution; Two-sample t test was used for the normal distribution data.

Feature selection results
The 790 stable features retained by ICC, then each of three ROI models were remained with 30 radiomic features reduced by mRMR. Finally, LASSO regression analysis and 10-fold cross validation were used for screening, then the model ROI1, ROI2 and ROI3 were left with 9, 10 and 11 features respectively ( Fig. 2A-C). After the number of feature determined, RadScores of training cohort and test cohort were calculated in each model. the most predictive subset of feature was chosen and the corresponding coe cients were evaluated (Fig. 3A-C).

Comprehensive radiomics signature construction and validation
The comparative analysis showed that there were signi cant differences in the RadScore distribution from class 0 and Class 1 of the three models on the training group and testing group respectively (P < 0.05) (Fig. 4A-C). Logistic regression analysis of patients' radiomics features was performed to obtain ROC curves to classify the two types of nodules, and the test data were used for internal independent veri cation in each model.  (Fig. 5A-C). Therefore, Nodules plus external model is the optimal prediction model. Then, the radiomic features of the optimal model were combined with clinical label (nodule type) to establish a tuberculosis prediction model, and the area under the ROC curve (AUC) of the validation group was 0.93 (95% con dence interval 0.88 ~ 0.98), accuracy was 0.870, sensitivity was 0.871 and speci city was 0.852, as shown in Fig. 6. Finally an radiomics nomogram which included radiomic features and clinical risk factor (nodule type) were established to predict the risk of pulmonary nodules being tuberculous (Fig. 7). Decision curve evaluation showed that the nomogram model had better clinical effectiveness at a risk threshold > 2% than the model without clinical label and radiomic feature model (Fig. 8).

Discussion
Recently,Most studies of pulmonary nodules focus on the discrimination of benign and malignant nodules, and pay insu cient attention to the classi cation and diagnosis of benign nodules [12,13] . In order to reduce medical waste under the requirements of precision medical model, it is necessary to further classify and diagnose benign nodules. The most common types of benign pulmonary nodules are tuberculous nodules and non-tuberculosis in ammatory nodules [14] . The two types of pulmonary nodules are indistinguishable from the human eye on CT imaging [15] . Therefore, in this study, a model was established to distinguish the two types of benign nodules by means of imaging omics, and multiple layers of analysis were performed from the nodules themselves and the surrounding parenchyma. The results showed that the AUC of the classi cation diagnosis model of nodule plus surrounding substance was 0.93, with high sensitivity and speci city, and the decision curve analysis had the best clinical e cacy, which could bring good help for clinical diagnosis and treatment plan.
By extracting high-throughput features of pulmonary nodules from CT images, radiomics quanti es deeplevel feature studies to achieve the purpose of improving the accuracy of classi cation and diagnosis [16] . In recent study, researchers established a classi cation and diagnosis model for tuberculous nodules and lung adenocarcinoma using radiomics nomogram with better accuracy [17] , the AUC of the training set, internal validation set and external validation set were 0.889, 0.879 and 0.809, respectively. This shows that radiomic has obvious advantages in the classi cation and diagnosis of different pathological types of pulmonary nodules. In the general data of pulmonary nodules in the two groups included in this study, although the difference of nodules type was statistically signi cant. Solid nodules were in the majority in the tuberculous nodules group, while solid and pure ground glass accounted for the same proportion in the non-tuberculous in ammatory nodules, there were still considerable di culties in the classi cation and diagnosis of benign pulmonary nodules in practice. Therefore, the author believes that the image features can be excavated to further re ne the classi cation of benign nodules, so as to guide the accurate diagnosis and treatment of pulmonary nodules.
Tuberculous nodules are a special type of in ammatory nodules whose pathological basis is quite different from that of conventional in ammatory nodules [18] . Solitary tuberculous nodules are often misdiagnosed as lung cancer, in ammatory nodules and other lesions because they are relatively rare and lack common tuberculosis features [19] . Tuberculous nodules are essentially granulomatous lesions, consisting mainly of brous tissue and its surrounding caseous necrosis, surrounded by granulation tissue [20] . Due to the proliferation of granulation tissue and the binding of the bronchial wall, there is less marginal in ltration. In ammatory nodules are the result of a series of pathophysiological changes such as increased vascular permeability, in ammatory cell in ltration and serous exudation, and granulation tissue hyperplasia caused by pathogenic bacteria. Therefore, the edges of nodules are usually blurred and peripheral vascular congestion is increased [21] . In ammatory nodules are generally considered to be of uniform density with no or rare bronchial aeration. Smooth margins, shallow lobules or long thick burrs are more common in benign lesions [22] . Therefore, the two types of nodules are signi cantly different in their central pathologic structure and peripheral exudation or microvascular in ltration. In this study, based on this pathological basis, the radiomics model was established at three levels: the nodule itself, the parenchyma zone around the nodule, and the nodule plus external expansion, which also put forward higher requirements for nodule segmentation.
Image segmentation plays a key role in the study of Radiomics [23] . Manual segmentation, semi-automatic segmentation and automatic segmentation are commonly used methods. Recent study showed that semi-automatic segmentation was more reliable for the radiomics parameters extracted from isolated pulmonary nodules and could provide objective and stable information for the classi cation model [24] . Other studies explored the inclusion of features around pulmonary nodules into deep learning tools to evaluate benign and malignant pulmonary nodules, and found that the prediction model incorporating the surrounding parenchymal tissue with the 1/4 diameter band of the nodules had the best e cacy [25] . In this study, semi-automatic segmentation and manual correction of the nodule boundary were used to determine the pulmonary nodule as ROI1, and the 3mm parenchyma around the nodule was regarded as ROI2 by the expansion function of AK platform, and nally, the nodule plus expansion was treated as ROI3. Through the above method to achieve accurate segmentation.
The wavelet conversion feature can focus the energy of the original image on a small number of wavelet coe cients, and the decomposed wavelet coe cients have high local correlation on the detail components in three directions providing a strong condition for feature extraction [26] . Many studies have shown that wavelet changes can effectively remove the striation noise in CT and MRI images, which has obvious advantages in the classi cation and predictive analysis of radiomics and has been widely used in the study of radiomics and texture analysis [27,28] . In this study, the selected features are mostly based on the wavelet characteristics of frequency transform. In the most e cient ROI3 model, 10 of the 11 radiomic features are wavelet features and the other one is original shape feature, which cover nodule shape feature, skewness of histogram, and high order features (GLCM, GLDM and GLRLM). The texture features of GLCM and GLDM can better display the texture information of nodules than histogram [29] , and GLRLM has a signi cant effect in characterizing the consistency of nodules or speckled textures [30] . The features excavated in this study can re ect the texture difference between the two groups of nodules from different angles and dimensions of the image. At the same time, it was found that the nodule plus external model had the highest diagnostic e cacy, which also con rmed that the pathological basis of in ltration of the two types of nodules themselves and surrounding parenchymal is different, re ected in the image, that is, invisible radiomic features can distinguish the two types of nodule well.
The multiple Logistic regression analysis showed good performance in classi cation modeling [31,32] . Recent research used this method to establish the benign and malignant differentiation model of pulmonary nodules, and the AUC reached 0.89 [33] . In this study, the ROC curves of the training cohort and the validation cohort were modeled by 11 radiomic features of the Nodules plus external (ROI3), and the area under the ROC curves (AUC) of the two groups were both 0.93. It is worth noting that the AUC of the two groups was still 0.93 after the combination of radiomic features and clinical risk factor (nodule types), but the decision curves analysis (DCA) showed that the model combined with radiomic features had good clinical e cacy at the onset of risk. In this analysis, although the types of nodules observed by the naked eye of the traditional imaging methods were statistically signi cant in the two groups of different types of pulmonary nodules, they did not play a decisive role in the establishment of the classi cation model. However, radiomic features obtained after dimensionality reduction are useful for distinguishing non-tuberculous in ammatory pulmonary nodules from tuberculous nodules [34] . Therefore, this study can classify benign pulmonary nodules, non-tuberculous in ammatory pulmonary nodules and tuberculous nodules by using the radiomics nomogram model, which provides reference for the judgment of pathological types of benign pulmonary nodules before clinical operation in the future, and has certain guiding value. This study has the following limitations: First, the sample size of this study is not large enough, which may affect the performance of the radiomics model, and the future work should focus on large-scale and multi-center studies. Secondly, some laboratory data samples of cases included in this study are too small, and important tuberculosis laboratory tests are often missing. At the same time, it is also the purpose of this study, if the nodules can be identi ed as tuberculosis in the early stage, the relevant clinical laboratory examination can be suggested. Although radiomics is widely used in medicine, the stability and redundancy of features are of concern. In future studies, these limitations will be reduced to a greater extent by expanding the sample size, establishing a standard database, integrating more laboratory examination indicators, and adopting arti cial intelligence to automatically segment nodules, so as to make the classi cation model more e cient and applicable.

Conclusion
In summary, on the basis of both practicality and accuracy, this study established three radiomics models, including the pulmonary nodule, surrounding parenchymal and nodules plus expansion, analyzed the AUC of the three models, and found that the nodules plus expansion model is the optimal radiomics model for the classi cation and diagnosis of benign pulmonary nodules. The results of ROC and DCA analysis showed that compared with traditional imaging methods, this model has better predictive and classi cation value, and is expected to provide a basis for accurate diagnosis and treatment of pulmonary nodules.   (A-C) The left gures, radiomic features were selected with the lowest binomial deviance. Coe cient λ was selected in the LASSO regression model using 10-fold cross-validation. According to the minimum criteria and the 1 standard error of the minimum criteria, dotted vertical lines were drawn at the optimal value. The right gures, LASSO coe cient pro les of radiomic features. Dotted vertical line was drawn at the optimal λ selected using 10-fold cross-validation.   The ROC curves of the training set and the validation set for predicting the risk of tuberculosis nodules.

Abbreviations
The red curve represents the Nodules plus external model of radiomics labels combined with clinical risk factors (nodular type), with an AUC of 0.93. The blue curve represents the AUC of 0.93 when the radiomics label prediction model is used alone. The green curve represents the AUC of 0.67 and 0.69 when the clinical data prediction model was used alone. Radiomics nomogram. Risk factors included pulmonary nodule type and Radscore, where nodule type 1=pGGN, 2=mGGN, and 3=SN. Positioning was performed on the horizontal axis of Radscore and nodule type, and vertical lines were drawn to obtain the corresponding score values. The score values were added to the horizontal axis of the total score values for positioning, and vertical lines were drawn to the risk series number axis. The risk coe cient was used to predict the risk degree of tuberculosis nodules.

Figure 8
Decision curves analysis of the predictive model in all patients with pulmonary nodules. The decision curve represents the e cacy values under different risk thresholds. When the risk threshold is greater than 2%, the method of predicting the risk of tuberculosis nodules using the nomogram model is superior to identifying all nodules as tuberculosis nodules or non-tuberculosis in ammatory nodules, and also superior to the prediction method without radiomics label.