Our data shows that PAR and all three clinical risk model calculators do a poor job overall of appropriately categorizing the risk of malignancy in patients undergoing bronchoscopy for suspected lung cancer where the pathology is nondiagnostic. Although the risk models matched the expected prevalence in the intermediate risk group, more than half of all the lesions labeled high risk by all three risk models were truly benign. Use of these models in this subset of patients to guide decision-making could result in patients with benign disease undergoing repeat bronchoscopy, other invasive non-surgical sampling, or surgery.
The accuracy of a risk model calculator in an individual patient is influenced by the degree to which that patient matches the characteristics of the population used to develop the model. Risk models that directly incorporate the root causes of the disease, such as genomic alterations, are often more generalizable and robust to small cohort changes. On the other hand, models built solely on correlating factors that only have an indirect role in the disease are often more sensitive to changes when the composition of those factors changes in the patient cohort. Our data suggest that the subgroup of patients with lung lesions who undergo bronchoscopy may not have been well-represented in the training and validation sets for these models, limiting their usefulness in clinical practice.
Multiple studies have sought to compare PAR and various risk models. One recent prospective study concluded that clinician assessment was slightly better at predicting malignancy than two validated models(8), while other studies show similar accuracy between PAR and various models(6, 18). In one survey study that used clinical vignettes to test the accuracy of risk assessment, all modalities showed only modest performance, with PAR showing an AUC of only 0.70 (95% CI, 0.62–0.77), while the performance of two commonly used risk models was essentially no better(18).
PAR in this study demonstrates a pattern of categorization that better matches the expected prevalence of malignancy in each risk category. However, it too performed suboptimally in this group of patients. While PAR slightly outperformed the risk models with respect to the ROC curves, PAR’s AUC of 0.64 (excluding nondiagnostic lesions) demonstrates relatively poor discriminatory performance, whereas the AUCs for the Brock, VA and Mayo Models sit at, or close to, random chance for the study population, and only slightly better when confined to lung nodules. None of the risk calculators came near to approximating the AUCs demonstrated in their respective validation sets, lacking specificity in this cohort of patients, evidence by the S-shaped ROC curves for the Mayo and VA model. In the lower left quadrant of an ROC, where the curve reflects specificity, the plots for Mayo and VA fall to the right of the line where the true positive rate and true negative rate are equal. Thus, these risk model calculators are wrong more often than they are right when labeling a lesion high risk in this population of patients. The Brock model performs only as well as random chance. Strongly driven by size, these models are likely to interpret smaller cancers as benign while categorizing larger benign lesions as likely malignant. The larger, spiculated lesions in older patients that ultimately proved not to have cancer likely represented inflammatory or infectious findings on CT. The slightly superior performance of PAR in this subset of patients may reflect the ability of a physician to weigh clinical context as part of the risk assessment.
Inaccurate risk assessment contributes significantly to health care costs. In a recent cost benefit analysis using CMS claims data, 43.6% of patients who underwent a biopsy for a suspicious lung lesion were found not to have a cancer. Over 43% of the total costs of diagnostic evaluations for lung cancer in the U.S. was attributable to invasive procedures performed in patients with benign disease(19). With the growing acceptance of lung cancer screening and increased use of diagnostic CT for other reasons, we can expect to see greater numbers of patients with lung lesions that will require risk assessment(1, 20). Understanding the accuracy and limitations of the currently available tools for risk assessment in these patients will be crucial for the appropriate utilization of resources to meet the needs of this epidemic, facilitating prompt diagnosis and treatment for patients with cancer while minimizing morbidity and cost for those without.
Our study is the first to evaluate the performance of PAR and the risk model calculators in this subset of patients deemed appropriate for bronchoscopic biopsy where the effort has failed to establish a diagnosis. It has several important strengths. First, it demonstrates the scope of the problem in real-world clinical practice: 60% of cases in the Percepta BGC Registry Study had a nondiagnostic bronchoscopy. It focuses on this subset of group of patients in whom decision-making regarding management of suspicious lung lesions is often most difficult – those where non-surgical tissue sampling is deemed necessary but has failed once, necessitation further invasive procedures or a retreat to radiographic surveillance. Another strength is the fact that PAR was determined and recorded by the treating pulmonologist prior to the procedure, thus our PAR reflects
Our study has several limitations that are important to consider when attempting to generalize the findings to other patients undergoing risk assessment for lung cancer. Each of the models was developed and validated on a specific population, and none were developed specifically in a cohort of patients undergoing bronchoscopy, thus their suboptimal performance in our cohort may be expected. Moreover, the two-conditional nature of our cohort, with patients first selected for bronchoscopy and then selected for a nondiagnostic result, may have resulted in a prevalence of malignancy in the high risk category that was lower than the prevalence in the group of patients undergoing bronchoscopy as a whole. If, for example, a risk model was very good at correctly identifying larger malignant lesions as high risk, and bronchoscopy was more likely to be successful in these patients, the result would be a reduction in the prevalence of malignancy in that model’s high risk category in the nondiagnostic subset. This kind of bias could account for the differential performance of the models in lung nodules compared to lung masses.
The fact that our findings may be an underestimate of the performance of the models in lesions undergoing bronchoscopy as a whole does not diminish their importance; a nondiagnostic bronchoscopy is a common, real-world scenario in which decision-making is guided by risk assessment, and accuracy in this assessment is essential.
Another important consideration is the fact that the Mayo and VA risk models were developed in patients with pulmonary nodules defined as ≤ 30 mm; patients with lesions > 30 mm were not included in their validation cohorts. Twenty six percent of our study cohort had lesions > 30 mm and therefore fell outside of the confines of these models. Weakness of these models in patients who fall outside the confines of their validation cohorts might be expected, but the Brock model was developed in a cohort with lesions up to 86 mm, and it too performed poorly in our study. It is worth noting that the online calculators based on these models accept input for size exceeding 30 mm.
What is needed is an objective, less error-prone means for risk assessment that can be used across the heterogeneous spectrum of patients with indeterminant lung lesions. Optimally, this would be an accurate, noninvasive biomarker that could further enhance the utility of clinical and radiographic factors in differentiating early-stage lung cancers from benign disease(7, 21). Various novel methods for risk assessment have been developed that utilize plasma biomarkers, autoantibodies to tumor-associated antigens, exhaled breath compounds, and bronchial and nasopharyngeal genomic classifiers, to augment the accuracy of existing risk prediction models in identifying malignant lesions from benign disease(22–26). Radiomics, the use of quantitative data obtained from CT imaging to predict the risk of malignancy in a nodule, is another approach actively in development(27). Each of these novel tools will require both clinical validation and a determination of clinical utility before they can be incorporated into the paradigm for risk assessment of patients with indeterminant lung lesions, with the potential to improve care for both patients with lung cancer and those without(7, 28).