A New Scoring System Combining A Four-Section Honeycomb Lung Percentage on HRCT and Other Comprehensive Multiparameter for Evaluating Pulmonary Fibrosis Severity

How to accurately assess IPF severity and predict prognosis remains a problem. This study aimed to develop a new method, which can be easily used to assess pulmonary brosis severity. were 74.0 (95%CI: 57.4-88.6), 72.0 (95%CI: 58.2-83.4), and 70.1 (95%CI: 55.8-81.9), respectively. The one-, two-, and three-year Brier values were 8.8 (95%CI: 4.9-13.8), 16.8 (95%CI: 12.5-22.4), and 19.5 (95%CI: 15.2-25.1), respectively. The GAP model-predicted AUCs of one-, two-, and three-year death risk were 73.9 (95%CI: 58.4-87.7), 72.3 (95%CI: 58.8-83.4), and 70.8 (95%CI: 57.8-82.9), respectively. The one-, two-, and three-year Brier values were 8.9 (95%CI: 4.8-13.9), 16.9 (95%CI: 13.0-22.5), 19.7 (95%CI: 15.3- 25.0), respectively. The CTPF model shows the best AUC value, Brier value, and stability, indicating the best predictive effectiveness.

The currently available IPF severity scoring methods included major four methods: 1) the clinicalradiographic-physiologic (CRP) scoring method published in 1986 by Leslie C. Watters et al [4,5]. This method uses too many variables, so it is a complex calculation method. In 2001, Talmadge E. King et al. [6] improved the CRP scoring method by including additional parameters, which further increases the complexity of this assessment method. 2) In 2002, Athol U. Wells et al [7] proposed a composite physiologic index (CPI) method to assess interstitial lung disease (ILD) severity by combing chest computed tomography (CT) results and pulmonary function parameters. However, they did not use this method to predict the death risk and the calculation formula of the CPI method is complex, which limits its adoption in clinical practice. 3) Brett Ley, MD et al. [8] suggested a gender, age, and physiologic (GAP)based method, which is based on the data of gender, age, forced vital capacity (FVC), and he ratio of diffusing capacity of the lung for Carbon Monoxide (DLco). However, the GAP method does not include critical parameter such as chest CT. Thus the assessment accuracy of the GAP method is compromised. 4) Japanese researcher Ryo Okuda et al [9] proposed to use only two arterial blood gas indicators, arterial partial pressure of oxygen (PaO 2 ) and oxyhemoglobin saturation (SaO 2 %), to assess IPF severity. Thus this method was too simple to access the disease severity. In 2017, Hasti Robbie et al [10] analyzed the contributions of physiological parameters, histopathological parameters, imaging parameters, biomarkers to the assessment of IPF severity and concluded that using a single type of parameters to assess IPF severity has serious limitations.
In this study, based on the available scoring methods, we chose parameters that have been proven to have a good prognostic value and can be acquired easily in clinical practice to develop a new scoring method to assess pulmonary brosis severity (patent application number: 201910514972.5).

Development of a New Scoring Method
Pulmonary Fibrosis Staging by Chest High-resolution Computed Tomography (HRCT) (CT-based brosis staging, Fig. 1) Based on the latest 2018 IPF guidelines [1], The severity and area of the lesions showing on chest CT images are important predictor of IPF mortality [11]. Previous imaging studies on interstitial pneumonia and IPF have proposed that honeycomb was the best imaging characteristic to predict the survival and prognosis of patients.er. Moreover, honeycomb and stretch bronchiectasis are the most representative imaging manifestations of pulmonary brosis [12][13][14]. Therefore, we choose the pathological range of honeycomb and traction bronchiectasis to evaluate the extent of pulmonary brosis.
Traditional manual evaluation method only included three lung CT sections representing top, middle and lower area to estimate the extent of entire lung. However, the honeycomb lung of IPF usually in the lower lungs. Thus we referred the theories proposed in the previous studies [15,16] and calculus principles to design a "four-section honeycomb lung percentage" method. We selected the following four representative lung CT sections to semi-quantitatively estimate the extent of honeycomb lesion in the entire lung: the aortic arch section, the tracheal bifurcation section, the section of basal (dorsal) segment of the tracheal bifurcation at the inferior lobes, and the section below the right lung apex. Each section included both the left and the right lungs. The largest transverse diameter line of each lung section was evenly divided into three parts, and then the lung section was divided into inner, middle, and outer sections by drawing lines starting from the dividing points alone the shape of patient's thorax. The outer lung section was then evenly divided into 6 small areas; the middle lung Sect. 4-5 areas (5 areas for a large middle section); the inner lung Sect. 2 areas. Thus, in total, the 4 CT sections comprised 8 lung sections (4 left + 4 right lung) and were evenly divided into approximately 100 small areas (12 or 13 per lung section × 8). Each small area was scored as 1 when there was positive honeycomb lesion and traction bronchiectasis in the area, and the total score of the entire lung was used as the total honeycomb lung score. The honeycomb lung percentage was calculated as: (total honeycomb score + total traction bronchiectasis score) ÷ total number of the small areas × 100%. For example, if 8 lung sections were evenly divided into 100 small areas and 30 of them were scored as positive honeycomb lung or traction bronchiectasis, then the honeycomb lung percentage was 30%. According to Lynch et al [17], lung brosis can be staged based on the following lung CT characteristics: stage I: there is reticular and linear shadow but no honeycomb lesion; stage II: honeycomb lesion area is < 25% of the entire lung; stage III: honeycomb lesion area is 25%-49%; stage IV: honeycomb lesion area is 50%-75%; stage V: honeycomb lesion area is > 75%.

Assess Pulmonary Fibrosis Severity by Using Multiparameter-based Comprehensive Scoring Method
Patients' baseline physiological condition and lung function parameters are important predictors for survival [7][8][9]18]. We compared the advantages and disadvantages of the existing pulmonary brosis severity scoring methods (Table 1)and chose the 5 parameters that are of important predictive values and are relatively easy to be collected in clinical practice: FVC%pred, DLco%pred, oxygen saturation of peripheral blood (SpO 2 %), age, and gender. We used the 5 parameters to evaluate the disease severity. We followed the previous studies [4][5][6][7][8][9] to de ne a multiparameter-based (parameters of pulmonary function and physiological condition, PF-based grading) comprehensive scoring criteria to estimate disease severity. We then combined this PF-based grading method with the CT-based pulmonary brosis staging method to develop a new scoring method (CTPF) to assess pulmonary brosis severity (Table 2). HRCT: high-resolution computed tomography.

Scoring the Clinical Data
Two radiologists used the CT-based pulmonary brosis staging method described above to evaluate patients' chest HRCT images. The average scores from the two radiologists were used as patients' nal lung brosis scores, and then the scores were used to stage pulmonary brosis according to the criteria described in Table 2. Patients' age, gender, FVC%pred, DLco%pred, and SpO 2 % were scored according to the criteria in Table 2, and the total scores were used to estimate PF-based disease severity according to the criteria in Table 2. The de nition of disease severity is: score 0-3 for grade (a) mild; score 4-6 for grade (b) moderate; score 7-10 for grade (c) severe. The CT-based stage and the PF-based severity were combined to determine patients' CTPF stage (Examples are presented in Fig. 3A and 3B).

Statistical Analyses
Measurement data are expressed as mean ± standard deviation (SD). Count data are presented as percentage (%) or proportion (%). Intra-group correlation coe cient was calculated to estimate the CT score consistency between the two radiologists [19,20]. Spearman correlation coe cient was calculated to analyze the correlation between CT-based brosis scores and pulmonary function parameters (FVC%pred, DLco%pred, SpO 2 %) and CPI index. The competition risk (Fine-Gray) model was used to analyze the relationship between prognosis (cumulative mortality) and the CT-based brosis stage and the PF-based severity grade [21]. Patients' survival period was de ned from the time when patients' data were acquired to the time of death endpoint or the last follow-up visit. The time unit was month. The death endpoint of this study was de ned as the death caused by lung diseases (IPF exacerbation or IPF combined with lung cancer). Lung transplantation is considered to be the most effective treatment for patients with IPF, so the occurrence of lung transplantation was considered as a competitive risk event in this study [22]. Other types of data were treated as censored data.
We used the following strategies to develop and evaluate disease prognosis prediction models: (1) Considered lung transplantation occurrence as a competitive risk event and used CT-based stage, PFbased grade, and CTPF comprehensive stage as predictors. To estimate the accuracy of prediction models, we included the GAP staging method proposed by Brett Ley, MD et al [8] in our analysis. We used all the data and the Fine-Gray regression analysis to establish 4 death-risk prediction models: CT-based brosis stage model, PF-based severity grade model, CTPF combined stage model, and GAP stage model.
(2) The Bootstrap cross-validation method was used to validate the predictive effectiveness of the 4 models, and the validation was repeated 1000 times to obtain the following average indexes of model prediction accuracy: area under the ROC curve (AUC), Brier score, and a calibration curve. The AUC value re ects the discrimination of the models. It is generally accepted that the model has a satisfactory discrimination to death risk from a disease when AUC is > 75%. The calibration curve re ects the consistency between the predicted risk and the actual risk. The Brier scores re ect both the discrimination and calibration of a model. The smaller the Brier score is, the better the discrimination and calibration of a model is [21]. (3) Prepared a nomogram to display the CTPF model-predicted one-, two-, and three-year cumulative risk of death in patients with different CT stage and PF grade [23].

Patients' Clinical Characteristics
Patient screening ow chart is displayed in Fig. 2  CT Score values by reviewer 1 and CT Score values by reviewer 2 were the scores from the two radiologists using the "4-section honeycomb lung percentage" method to score patients' HRCT imaging results. CT-based stage: The stage was determined by using the average score of the two radiologists and following the criteria described in Table 2. PF-based grade: The grade was determined by using the pulmonary function and physiological parameters (age, gender, FVC%pred, DLco%pred, and SpO2%) and following the description in Table 2. The grade was de ned as: mild (a), moderate (b), and severe (c). GAP (gender, age, and physiologic variables) stage followed the recommendation by Brett Ley, and a higher stage represented a greater death risk. CPI: composite physiologic index. In 2002, Athol U. Wells and colleagues proposed to use CPI, which combined chest CT and pulmonary functional parameters, to assess the severity of interstitial lung diseases (ILDs). A higher CPI represents a more severe ILD.
The Relationship Between CT-based Stage/PF-based Severity and Pulmonary Function and Death Risk The average CT scores of the 192 patients from the two radiologists using the "4-section honeycomb percentage" method were 24.4 ± 14.1 and 24.7 ± 14.4, respectively; the highest scores were 67 and 65, respectively, and the lowest values were 1 and 3, respectively ( Table 3). The inter-observer variability of the scores from the two radiologists was 0.95 (P < 0.05). For each patient, the mean CT score from the two radiologists was used as the nal CT score. The nal CT scores were used in the Spearman correlation analysis to assess the correlation between the CT scores and pulmonary function parameters (Fig. 4). The CT scores negatively correlated with FVC%pred (r s = -0.47, P < 0.01, Fig. 4A), DLco%pred (r s = -0.66, P < 0.01, Fig. 4B), and SpO 2 % (r s = -0.40, P < 0.01, Fig. 4C) and positively correlated with CPI index (r s =0.63, P < 0.01, Fig. 4D), which represented ILD severity. These data support that the "4-section honeycomb lung percentage" scoring method can effectively represent the severity of pulmonary brosis.
To analyze the correlation between CT-based stage and death risk, we performed Fine-Gray univariate regression (Fig. 5A) and multivariate regression to eliminate the potential confounding effects from the PF-based grade (Fig. 5B). Both analyses revealed that CT stage positively correlated with death risk.
Similarly, both Fine-Gray univariate regression (Fig. 5C) and multivariate regression to eliminate the potential confounding effects from the CT-based stage (Fig. 5D) found that PF-based grade also positively correlated with death risk.

CTPF stage
HRCT images of two representative cases are displayed in Fig. 3. Figure 3A shows that the patient was CT-based stage III and PF-staged grade c and thus CTPF stage III c. The patient developed IPF exacerbation and died 23 months after the patient's clinical data were acquired for the assessment of this study. Figure 3B shows CT-based stage II and PF-based grade a and thus CTPF stage II a, and this patient survived well in the 39-month follow-up visit.   Table 2 Figure 6B is the nomogram showing CTPF-based death risk prediction, which was prepared from the CT-based stage and PF-based grade multivariate Fine-Gray regression coe cients. Figures 6C,   6D, and 6E show the calibration curves of the four prediction models after Bootstrap cross-validation. Figure 6 suggest that the CTPF stage are the best model to predicting death risk. Table 5 displayed the one-, two-, and three-year cumulative death risks of patients calculated by different CTPF stage CT II: Honeycomb lesion area was < 25% of the entire lung. CT III: Honeycomb lesion area was 25%-49% of the entire lung. CT IV: Honeycomb lesion area was 50%-75%. PF-based grade was determined by assessing the scores of age, gender, FVC%pred, DLco%pred, and SpO 2 % according to the criteria in Table 2 and adding the scores. PF (a): score 0-3. PF(b): score 4-6. PF(c): score 7-10.

Discussion
Comparison of several available IPF staging methods (Table 1) shows that the staging results from some methods, such as the GAP and JRS methods, fail to accurately re ect IPF severity and predict prognosis because the methods include too few parameters. The calculation methods in the CRP and CPI scoring systems are too complex to be adopted in clinical practice [10]. Therefore, a new scoring method that can accurately assess IPF severity, predict prognosis, and can be used easily is greatly needed.
Chest HRCT is one of the common clinical examinations to diagnose IPF and assess IPF severity and prognosis. Honeycomb lung is the most representative lesion of pulmonary brosis, and the area of honeycomb lesion directly correlates to IPF prognosis [11][12][13][14]16].
Currently, CT scoring for IPF patients includes manual semi-quantitative evaluation and total quantitative evaluation by arti cial intelligence. Although the manual method is simple to use, the evaluation results are susceptible to the wide variation from different evaluators [24][25][26]. We took applicability in clinical practice into consideration and based on calculus principles to develop a "four-section honeycomb lung percentage" method, which can determine the proportion of honeycomb lung accurately and reduce interevaluator variation. In the current study, two radiologists reviewed patient HRCT results and determined the honeycomb lung percentage independently. The consistency coe cient of the two radiologists' scoring results was 0.95 (P < 0.05), and the brosis stage determined according to the honeycomb percentage was also consistent in the two radiologists. In addition, the CT-based stage negatively correlated with patients' lung function parameters (FVC%pred, DLco%pred, and SpO 2 %) and positively correlated with CPI index (Fig. 4). The CPI index re ects IPF severity. Patients with higher CT-based stage had a greater accumulative death risk. These results indicate that our CT-based brosis staging method may effectively re ect IPF severity and prognosis.
Previous studies have shown that age, gender, oxygen use at rest, lower FVC %pred and lower DLco % pred were associated closely with risk of death in patients with IPF [4-8, 18, 27]. Thus, we selected the 5 important and clinical easily available lung function and physiological parameters, FVC%pred, DLco%pred, SpO 2 %, age, and gender to assess IPF severity grade (PF-based severity grade). Both our univariate and multivariate regression analysis revealed that PF-based severity grade was an independent risk factor for death from IPF.
Compared with the CT-based brosis staging method, the PF-based severity grading method, and the GAP staging method, the CTPF comprehensive staging method, which combined the CT-based brosis staging and the PF-based severity grading methods, showed the best AUC value, Brier score, and stability in terms of predicting death risk. For example, the case presented in Fig. 3A was CTPF stage III c, and his predicted 2-year death risk was 49.65% according to Table 5. The patient died of acute IPF exacerbation 23 months after his clinical data were collected for the assessment in this study. The case in Fig. 3B was CTPF stage II a, which corresponded to a predicted 3-year death risk of only 17.50%. This patient survived well 39 months after his data were collected for the assessment. These results support that our CTPF comprehensive staging method can accurately predict patient death risk.
Lung transplantation has been considered to be an effective treatment for improving the survival of patients with IPF. Thus, we used lung transplantation as a competitive risk of death to calculate death risk when we validated the new CTPF comprehensive staging method. However, lung transplantation also has a death risk. In 2015, Yusen, RD et al [28] reported that the global lung transplantation one-year and three-year death risk was 20% and 35%, respectively. When the death risk (

Consent for publication
Not applicable Availability of data and materials The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Con ict of Interest
The authors con rm that there are no con icts of interest.  there was positive honeycomb lesion in the area, and the total score of the entire lung was used as the total honeycomb lung score. The total traction bronchiectasis score was calculated in the same way as the total honeycomb lung score. The honeycomb lung percentage was calculated as: (total honeycomb score + total traction bronchiectasis score) ÷ total number of the small areas × 100%. For example, if the 8 lung sections were evenly divided into 100 small areas and 30 of them were scored as positive honeycomb lung or traction bronchiectasis, then the honeycomb lung percentage was 30%  DLco × 100%. PF: pulmonary function & physiological parameters. PF score: age, gender, FVC%pred, DLco%pred, and SpO2% were scored and added. PF-based grade was determined by using the pulmonary function and physiological parameters (age, gender, FVC%pred, DLco%pred, and SpO2%) and following the description in Table 2. The grade was de ned as: PF score 0-3 was mild (a); PF score 4-6 moderate (b); PF score 7-10 severe (c). SpO2%: oxygen saturation of peripheral blood. FVC: forced vital capacity. FVC% pred: actual FVC/predicted FVC ×100%. DLco: diffusing capacity of the lung for carbon Monoxide. DLco% pred: actual DLco/ predicted DLco ×100%. CPI: composite physiologic index. In 2002, Athol U. Wells and colleagues proposed to use CPI, which combined chest CT and pulmonary functional parameters, to assess the severity of interstitial lung diseases (ILDs). A higher CPI represents a more severe ILD. CT score: mean CT score from the two radiologists.  Table 2. The de nition of CT-based stage was: honeycomb lung < 25% was Stage II; honeycomb lung 25%-49% Stage III; honeycomb lung 50%-75% Stage IV; honeycomb lung >75% Stage V.
PF-based grade: The grade was determined by using the pulmonary function and physiological parameters (age, gender, FVC%pred, DLco%pred, and SpO2%) and following the description in Table 2.