Our systematic review describes and evaluate published predictive models of Solitary Pulmonary Nodule (SPN) malignancy built from SPN incidentally encountered in routine clinical practice. The findings of this study showed that, there is an increasing scientific interest on developing new predictive models, 67% of the articles publication date was less than 5 years old, however, the design of the predictive models assessed showed important methodological deficiencies which compromises their clinical applicability. To describe the models, we followed The Fleischner Society recommendations(7) for management of incidentally found solitary pulmonary nodules (solid or subsolid). To evaluate the applicability and transferability of the predictive models to clinical practice we used the PROBAST tool(11).
To our knowledge, this is the first systematic review of studies that develop predictive models of SPN malignancy in routine clinical practice, with 73% of them (11/15) performed in Asian populations. A recent prospective study of a multiethnic cohort corroborated that Native Hawaiians and African Americans have twice the excess risk of developing lung cancer, with a low number of cigarettes consumed, compared to Japanese Americans and Latinos(47), however, in this review we did not find studies on predictive models based on Hawaiians or African Americans. Moreover, the Fleischner guidelines(7) consider race to be a risk factor for SPN malignancy; but this risk factor was not included in any of the models reviewed.
Age, followed by the size of the nodule (diameter) were the most frequently identified independent predictors in 13 studies and 9 respectively. This is in line with the scientific evidence6,7 showing that, with increased age and SPN diameter, the risk of malignancy also increases.
Fleischner recommendations(7) on nodule size are to use the average diameter as the average of long- and short-axis diameters, both of which should be obtained on the same transverse, coronal, or sagittal reconstructed image, which more accurately reflects three-dimensional tumor volume. Of the 15 models only 4 described how the nodule diameter was measured. Thus, 3 studies(16,23,29) only reported that the images of the nodule were acquired in 3-D dimensional mode, and 1(23) that the long and short axes of the nodules were measured, and the ratio of the short to long axis was calculated. Nodule diameter was not identified as an independent predictor risk factor of SPN malignancy in any of these studies.
As regards sex, differences have been observed in the clinical management of SPNs, with diagnostic delays identified, leading to a therapeutic delay, and greater radiation in women.(48) In our review, all studies included a female population, and in one(20) the predictive model with the highest proportion of ground glass (≥50%) identified being a woman as an independent predictor.
As regards morphology, SPN spiculation appears as a frequent predictor in almost all studies(15,17–19,21,24,25,28), with lobulation also being significant, as a final predictor of SPN malignancy in 4 studies(15,20,28,29).
Regarding calcification, central/lamellar/diffuse/popcorn calcifications suggest benignity, while dotted patterns/eccentric localization suggest malignancy. Calcification was predictive in 7 models(15,18–20,22,24,28) . However, as the calcification pattern was not taken into account, nodules with calcification indicating benign characteristics were treated in the same manner as if the pattern suggested malignancy, possibly creating bias in terms of the prediction of malignancy.
Although smoking is considered the highest risk criterion, it was only identified as a predictor in 6 of the models(15,17,22,25,27,28) . In the rest(16,18–21,23,24,26,29) it was perhaps not identified because the proportion of smokers/ex-smokers was low and the malignant SPNs showed a greater proportion of adenocarcinomas, a histological pattern that is less related to this exposure.
The previous history of any type of cancer in family members was collected in 6 studies (15,19,20,24,25,29) and was identified as a malignancy predictor in 2(15,19). Furthermore, the previous personal history of cancer was collected in 11 studies(15,17,29,18,19,21,24–28) and in 4 of the models(17,24,26,28) it was found to be a predictive factor of malignancy. Despite genetic susceptibility has been described previously, concluding that there is an association between a previous history of cancer in first-degree relatives, and increased risk of lung cancer in both sexes, (49) only one study (21) evaluated previous history of lung cancer in relatives and found that it was not a predictor of malignancy.
Some models found that CEA(15,18,24) and CYFRA 21-1(15,25) biomarkers were final predictors of malignancy; however, none of the studies performed external validations, nor do the Fleischner guidelines include them as risk factors for malignancy. Further studies are required to assess their future importance in routine clinical practice
Exposure to other carcinogens (asbestos, uranium, radon) has been described as a risk factor for lung cancer(7),(50). However, only one study collected exposure to asbestos(17) but did not identify it as a predictor. Although passive exposed to tobacco is one of the causes of lung cancer and it has been shown that 40% of children, 33% of non-smoking men and 35% of non-smoking women are exposed worldwide(51), only one study(23) analysed it and it was not found that passive exposure to tobacco smoke was an independent predictor of malignancy.
According to Fleischner guidelines, lung cancers occur more frequently in the upper lobes. However, although all studies collected the nodule location, only one study conducted in the USA(17) identified it as an independent predictor. In China, there are a high prevalence of tuberculosis and other granulomatous diseases, typically located in the upper lobes. Most of the studies in this review were involved the Asian population, without a relationship between nodule location and malignancy being observed.
Finally emphysema, considered as a risk factor(7), was identified in 2 articles(21,24), although neither was predictive. Chronic Obstructive Pulmonary Disease (COPD) was evaluated in a single study(21) but was not identified as a predictor. In another study(28), a final predictor was the history of chronic lung disease, but the type of disease was not specified. A recent meta-analysis confirms that this comorbidity is frequent in patients with lung cancer, and that both this and emphysema increase the level of risk, especially in smokers with heavy tobacco use.(52)
Assessment of the Prediction Model Risk of Bias
We will follow the PROBAST guidelines on potential biases distributed in 4 domains (participants, predictors, results, and analysis) to set out several methodological deficiencies of the studies included.(11)
There is clear disagreement between the prevalence of SPN malignancy found in the models included in this review (between 23% and 77.45%) and the prevalence in daily clinical practice (between 12.1 and 18.2%)(5). This is probably due to the fact that most models are based on the population referred for surgery/biopsy, with consequent selection bias, since there is an important group of the population attended to in routine clinic settings – those considered to be at lower risk of malignancy and less likely to be sent to surgery/biopsy – not included in most of the models studied. This selection bias occurs in all the studies except three(17,27,29), which used a case–control design nested in a cohort study, also including those that only required radiological follow-up. The rest describe themselves as retrospective cohort studies(15,16,18,19,21,23–26), and in three the type of study is not well established(20,22,28).
According to PROBAST(11), the prospective cohort study is considered the optimal design(11) with low risk of bias, since it allows all the information on the potential predictors (exposures) to be collected before the potential outcome, thus reducing selection or interviewer biases. Non-nested case–control studies in a cohort select a population from a study designed for another purpose, and therefore have a higher risk of bias. In line with the results obtained by Collins et al(53), the models are seldom prospective and usually use information from populations intended for a completely different purpose.
Nodule consistency (solid, subsolid) is a determining factor when predicting SPN malignancy. The stability of solid nodules is estimated over a period of 2 years(6,7), whereas in subsolids it is 5 years(7). Thus, longer initial follow-up intervals and longer total follow-up periods are recommended for subsolid nodules than for solid nodules. Bearing this in mind, this was insufficient in the 3 studies that followed up(17,27,29) with a 2 years of follow up, respectively. The remaining studies15,16, 18–25, 27,29, did not specify whether they followed up.
In some models there was categorization of continuous variables: in one(21) the values of the biomarkers were dichotomized; in others it was the smoking history (≥30 pack-years)(28), (≥ 400 pieces-year)(20); and in one it was the age (≥ 70 years)(26). This establishes an arbitrary cut-off point, from which a different risk level is established, causing loss of information, so that predictive capacity is lost.(11)
In most of the studies the analysis does not mention patients with missing data. These are interpreted as having been omitted, meaning that the analysis performed is an “available/complete case analysis”. This is the most frequent type of analysis in predictive models, and is the one which we suppose was in 14 of the 15 studies in which this information was not reported. The exclusion of missing data leads to biases in the association of the predictors with the result, and skews the performance of the model because after the exclusion of cases with incomplete information, the selected subpopulation may not be representative of the population. Only one study(21) took into account the missing data, and used the multiple imputation technique as recommended by PROBAST, with a lower risk of bias, and is considered the best method described.(11)
The optimal sample size for binary prediction models is considered to be a minimum of 100 events (preferably ≥200) for external validations(53), with 10-15 Events Per Variable (EPV) (better ≥20)(54) for development models.(11) This was the case in only 8 studies(15,16,18,23,24,27–29).
The external validation of any development model in an independent sample is essential to demonstrate its satisfactory performance, i.e., applicability and transferability in clinical practice. One of the most important limitations of the models created so far is the lack of external validations. External authors have only validated 3 models(17,19,27) (Appendix B), the most frequently evaluated being that of Swensen et al(17), which has presented good discrimination in all of them, with values greater than 0.75(55). Although there are studies that have created models and have externally validated them with very promising results(18,21,25), there are no studies as yet that corroborate the results obtained.
In some studies(15,18,23,25) the Hosmer-Lemeshow Test was the only calibration method used. However, it is not without limitations: large sample sizes can generate erroneous results and it does not reveal the magnitude of the difference between the predicted values and the observed values(55). This does not happen with the calibration slope (the method most recommended by PROBAST), which was performed in only 5 articles(17,21,22,24,27).
The 15 models analysed showed low clinical applicability due to the high probability of bias. In normal practice, models that do not present selection biases are required, ones that reflect all possible malignancy risk profiles (from none to all), that may occur in a patient with an SPN found incidentally. Some models are not explicit in the exclusion of patients with a recent history of cancer (the last 5 years)(16,24,25,27,28), possibly they are more likely to experience a tumour recurrence/metastasis, thus overestimating the predictive values. In other cases, only solid nodules are included(21,24) and cannot be applied to patients with subsolid nodules, and vice versa. Other recommendations on predictors and their measurements are that they should be standard and applicable to the clinical setting; specifically, biomarkers may not always be available.