Development and Analysis of a Predictive Nomogram Assessing Cancer Risk in a Chinese Cohort of Patients Presenting with Pulmonary Nodules


 Background: Lung cancer is a major global threat to public health for which a novel prognostic nomogram is urgently needed.Patients and methods: Here, we designed a novel prognostic nomogram using a training dataset consisting of 178 pulmonary nodules for design and 124nodules for external validation. The R ‘caret’ package was used to separate patients for design into two groups, including a training cohort (n=126) for model construction and an internal validation cohort (n=52). Optimal feature selection for this model was achieved using the least absolute shrinkage and selection operator regression (LASSO) model. C-index values, calibration plots, and decision curve analyses were used to gauge the discrimination, calibration, and clinical utility, respectively, of this predictive model. Validation was then performed with the validation cohort.Results: A predictive nomogram was successfully constructed incorporating hypertension status, plasma fibrinogen levels, serum uric acid (SUA) levels, triglyceride (TG) and high-density lipoprotein (HDL) levels, density, spicule sign, ground-glass opacity (GGO), and pulmonary nodule size. This model exhibited good discriminative ability, with a C-index value of 0.795 (95% CI: 0.720–0.870), and was well-calibrated. When we used the validation cohort to evaluate the model, the C-indexes were 0.886 (95% CI: 0.800–0.972) and 0.817 (95% CI: 0.747–0.897) for internal validation and external validation, respectively. Decision curve analyses indicated the clinical value of this predictive nomogram when used at a lung cancer possibility threshold of 9%.Conclusion: The nomogram constructed in this study, which incorporates hypertension status, plasma fibrinogen levels, SUA, TG, HDL, density, spicule sign, GGO status, and pulmonary nodule size was able to reliably predict lung cancer risk in this Chinese cohort of patients presenting with pulmonary nodules.


Introduction
Lung cancer is a form of malignancy arising from the unrestrained growth of bronchial and lung cells 1-2 , and it is one of the leading causes of mortality in the world 3 . Rates of lung cancer have been rising rapidly in recent years, particularly in more heavily industrialized nations 4 . Currently, lung cancer patients exhibit 5-year survival rates of approximately 16.6% 5 , and approximately 1 million individuals in China are forecast to suffer from lung cancer by the year 2025, with this nation exhibiting the highest global incidence of lung cancer.
Key risk factors associated with lung cancer incidence include speci c genetic mutations, smoking, and environmental exposures such as air pollution. There is also some evidence suggesting that factors such as a poor diet, alcohol intake, estrogen levels, the smoking of marijuana, and infection with human papillomavirus (HPV), human immunode ciency virus (HIV), and Epstein-Barr virus may increase lung cancer risk, although such evidence remains somewhat inconclusive 6 . Analyses of patient computed tomography (CT) scans often reveal pulmonary nodules in patients, and many models have been developed to gauge the link between such nodules and lung cancer risk, including the Brock model 7 .
These models, however, often do not take epidemiological variables, clinical ndings, and CT scan results into consideration at the same time, limiting their value as predictors of the relative risk of a given pulmonary nodule being malignant. The development of more reliable and accurate predictive tools has the potential to enable early intervention and treatment for lung cancer patients, maximizing their odds of positive outcomes. Herein, we analyzed 26 variables with potential relevance to the diagnosis of a given pulmonary nodule as being benign or malignant based on previous studies 7,8,9,10 .
By analyzing epidemiological, clinical, and CT-related factors for patients with pulmonary nodules that had undergone surgical treatment, we sought to develop a simple but robust predictive model that would enable the relative assessment of lung cancer risk based only on characteristics that can be readily assessed prior to surgery or other therapeutic interventions.

Patients
The Ethics Committee of the a liated Lihuili Hospital of Ningbo University, Lihuili hospital approved this study (approval no KY2020PJ141). Enrolled patients were individuals from China recruited at the Lihuili Hospital of Ningbo University of Medicine between October 2020 to April 2021 and the later external validation cohort from June 2021 to October 2021. Eligible patients were individuals that had undergone resective surgeries following pulmonary nodule identi cation. Patients provided written informed consent to participate in this study. Any patients diagnosed with serious cognitive or physical impairments, or other serious diseases were excluded from the study cohort. Data including patient clinical, demographic, and disease-related characteristics were retrieved from patient medical records. The R 'caret' package was used to separate patients for design into two groups at a 7:3 ratio at random, including a training cohort for nomogram construction and a internal validation cohort.

Statistical analysis
Data are given as numbers (percentages), and were analyzed using R (v 4.1.2, https://www.R-project.org).
The LASSO method, which enables the reduction of high-dimensional datasets, was utilized as a means of selecting the optimal predictors of lung cancer risk among included pulmonary nodule patients. Those features that yielded non-zero coe cient values in this LASSO regression analysis were retained for nomogram incorporation. The nal predictive model was constructed via a multivariate logistic regression approach, with all signi cance levels being two-sided. A training cohort was used to develop the predictive model, with calibration curves being used to assess nomogram calibration. Signi cant calibration curve results were indicative of a model that was not perfectly calibrated. Model discrimination performance was assessed based on the value of Harrell's C-index. Validation of this nomogram was additionally performed to calculate an accurate C-index value. Decision curve analyses were used to assess the clinical utility of this lung cancer risk nomogram by quantifying the net bene t at different probability thresholds in the lung cancer cohort, with the net bene t being calculated by subtracting the proportion of patients with false-positive results from the proportion of patients with truepositive results and by assessing the relative harm of failing to intervene as compared to the potential negative outcomes associated with an unnecessary intervention. Receiver operating characteristic (ROC) curves were also used to assess the precision of this predictive risk model.

Patient characteristics
In total, data from 178 patients for design with pulmonary nodules that visited our clinic between October 2020 and April 2021 were included in this study, as well as the 124 patients from June 2021 to October 2021 for external validation. Patients for design were randomly assigned to training and validation cohorts at a 7:3 ratio. Patients in the training cohort (74 males, 52 females, mean age: 60.64±10.50 years [range: 31-86 years]) were separated into groups with benign nodules and malignant lesions. For details regarding the demographic and clinical characteristics of patients in these groups, see Table 1.

Feature selection and predictive model development
In total, 26 potentially relevant features were evaluated for inclusion in a predictive model. Of these features, 9 were ultimately selected through a LASSO regression analysis of the 126 patients in the training cohort ( Figure 1A and B). These features included hypertension status, serum uric acid (SUA), triglyceride (TG) and high-density lipoprotein (HDL) levels, plasma brinogen levels, density, ground-glass opacity (GGO) status, spicule sign, and pulmonary nodule size. A predictive model incorporating these seven variables was developed using the training cohort ( Figure 2). Nodule density was de ned as being "low" when it exhibited a CT value that was higher than that of pulmonary tissue but lower than that of blood, "intermediate" for nodules with solid and GGO components, and "high" when CT values were greater for the nodule than for blood.

Assessment of predictive risk model performance
Calibration curves for this predictive nomogram when used to analyze the training cohort revealed it to be well-calibrated, with a C-index value of 0.795 (95% CI: 0.720-0.870) ( Figure 3A). Similarly, the C-indexes for internal validation and external validation were 0.886 (95% CI: 0.800-0.972) ( Figure 3B) and 0.817 (95% CI: 0.747-0.897) ( Figure 3C) respectively, consistent with the discriminative value of this model, suggesting that it exhibits good predictive capabilities.

Analysis of model clinical utility
Decision curve analyses for this predictive nomogram were next performed ( Figure 4). These analyses revealed that at a threshold probability of a patient and a doctor is >9 and <98%, respectively, then this nomogram exhibits value as a means of predicting lung cancer risk. Net bene t was comparable with some overlaps within this range when assessing lung cancer risk.

Discussion
Nomograms are valuable predictive tools that have been widely utilized in oncology and other clinical and research elds, offering a user-friendly approach to intuitively assessing the odds of a given prognosis or outcome based on a set of speci c variables, thereby aiding in clinical decision-making 11 . Many models for the treatment of pulmonary nodules were established on the basis of certain epidemiological variables and CT scan results. However, clinical ndings such as blood biomarkers are also very important for the diagnosis of lung cancer 12 . Moreover, for some of these variables, such as GGO, the surgical criteria are not well de ned such that treatments are often conducted according to surgeons' own experience 13,14 . As such, we herein sought to develop a new nomogram capable of predicting the relative risk of malignancy when evaluating patients with pulmonary nodules.
Here, we designed and validated a novel predictive model capable of assessing the risk of a given lung nodule being benign or malignant based on analysis of data from patients that had undergone pulmonary nodule resection. The resultant model incorporated demographic, disease-, and treatmentrelated features to easily predict the odds of a given pulmonary nodule corresponding to a lung cancer diagnosis. The model developed herein was accurate, and exhibited good calibration and discrimination in our validation cohort. The C-index value in this validation cohort was also high, indicating that the nomogram can be accurately used to gauge patient risk of pulmonary nodule malignancy 11 .
Prior studies have con rmed that hypertension is a common comorbidity in cancer patients 15 . Several mechanisms may explain this observation, including the fact that hypertension can increase VEGF levels in the plasma 16 . We identi ed hypertension as a risk factor for lung nodule malignancy. Fibrinogen has also been signi cantly linked to the risk of lung cancer in the past 17  shown to be negatively correlated with the risk of lung cancer in one cohort study 21 , with other studies having similarly supported the existence of lower HDL levels in lung cancer patients relative to healthy individuals 22 . HDL levels are also readily measured in a clinical context. TG levels have also been reported to be positively correlated with lung cancer incidence in an analysis conducted by Lin et al. of 4673 lung cancer patients in a cohort of 685,852 individuals 18 . Low and high TG levels have been linked to higher rates of lung cancer in a prospective cohort study 23 . With respect to spicule sign, Fang et al.
previously conducted a case-control study demonstrating that stage I lung adenocarcinoma patients were signi cantly more likely to exhibit this nding 24 . GGO ndings have been reported to be associated with cancer rates as high as 63%, with many surgeons believing that GGO nodules should be resected, particularly if they grow in size. Persistent GGO nodules may be indicative of a greater risk of malignancy when solid components are evident 25 . Tu et al. found CT density to be a valuable feature when differentiating between nodules that were malignant and benign 26 . Qiu et al. further determined that solitary ground-glass opacity nodule size and density upon high-resolution T evaluation were associated with invasive adenocarcinoma risk 27 . Nodule size may be the most important variable included in our predictive module, given that nodule diameter is a key determinant of treatment under the British Thoracic Society guidelines 28 and Fleischner Society Guidelines 29 . For nodules ≥ 10 mm in diameter, the odds of malignancy in the NELSON screening study were 15.2% 30 . As such, we included nodule diameter as the size variable in the present study.
Herein, we found that pulmonary modules > 8 mm in size were more likely to be malignant than smaller nodules (57.53% vs. 40.63%), suggesting that a predictive model including this parameter, after being appropriately calibrated, may aid in improving lung cancer patient outcomes by providing individualized predictions of risk. Herein, we thus developed a risk nomogram that may aid clinicians in differentiating between patients with benign or malignant lung nodules. It may also aid in the optimal selection of pulmonary nodules in the context of clinical research. For example, this model might be used to aid investigators in selecting patients with larger nodules and other risk-related ndings when selecting subjects for surgical procedures or other interventions. Early interventions including CT scans, biochemical analyses of blood samples, and family support can better bene t low-risk patients, while regular clinical examination can ensure the appropriate monitoring of lung nodules to better guide the appropriate assessment of patient prognosis.
Accurate prognostic evaluations can aid surgeons in predicting lung cancer risk in individual patients, ensuring timely intervention for high-risk patients while reducing the need for interventional treatment in low-risk patients. Accurately predicting the risk of lung cancer in a given patient is very challenging, and appropriate measurements together with multifaceted interventional approaches are thus the most reliable approach to detecting and evaluating patients with pulmonary nodules. Further research on this topic is warranted as the accurate detection of pulmonary nodules alone is necessary but insu cient for treating affected patients, underscoring directions for future study.

Limitations
There are multiple limitations to this study. For one, the sample size of this study was limited, and all patients were enrolled from a single center over a relatively limited study period. However, nomograms established by Chen et al. 31 and Luo et al. 32 , with the training and validation cohorts (61/101 and 32/43 patients, respectively), exhibited good accuracy. Additionally, risk factor analyses did not incorporate all possible risk factors that may be relevant to the differentiation between benign and malignant nodules. Other relevant factors not included in this analysis included the number of nodules and speci c comorbidity incidence rates. Lastly, while a bootstrap testing approach was used to validate our nomogram, the patients used for this validation approach may not be su cient to ensure the generalizability of these data to patients from other countries or regions. As such, further external validation in a wider pulmonary nodule patient population will be essential in the future.

Conclusions
In summary, we herein designed a novel nomogram with good accuracy that offers value as a means of differentiating between benign and malignant pulmonary nodules, enabling clinicians to better plan patient treatment. Such individualized risk analyses offer clinicians an opportunity to appropriately monitor and treat patients. However, further work will be needed to validate this nomogram in larger patient populations and to establish whether the treatment decisions made based on this nomogram will reduce rates of incorrect diagnosis and treatment planning for patients with pulmonary nodules.    Lung cancer risk nomogram. Note: An initial training cohort was used to develop this nomogram, which incorporated SUA, hypertension status, HDL, TG, plasma brinogen, pulmonary nodule size, GGO status, density, and spicule sign.

Figure 3
Calibration curves for lung cancer nomogram predictions of training cohort (A), internal validation cohort (B)and external cohort (C), respectively. Note: Predicted risk of lung cancer and actual lung cancer diagnoses are shown on the x-axis and y-axis, respectively, with the dotted line corresponding to a diagnostic model with perfect predictive accuracy and the solid line corresponding to actual nomogram performance. The closer these lines are to one another, the better the predictive performance of this nomogram. Figure 4