The characteristics and nomogram for primary lung lepidic adenocarcinoma

Background Lepidic adenocarcinoma (LPA) is an infrequent subtype of invasive pulmonary adenocarcinoma (ADC). However, the clinicopathological features and prognostic factors of LPA have not been elucidated. We undertook a retrospective population-based analysis to examine the clinicopathological features of LPA, and construct nomograms predicting long-term survival of LPA patients. Methods Data from the Surveillance, Epidemiology, and End Results (SEER) database of 4087 LPA patients diagnosed between 2005 and 2014 were retrospectively analyzed and compared with non-LPA pulmonary ADC to explore the clinicopathological and prognosis features of LPA. All patients included in our study were histologically conrmed. Those with multiple primary tumors in their lifetime, unknown survival data, unknown tumor-node-metastasis (TNM) stage, and unknown information including age, race, and marital status were excluded. Univariate and multivariate Cox proportional hazard models were performed to identify independent survival predictors for further nomogram development. The nomograms were internally and externally validated for concordance index and calibration plots as well as decision curve analysis. Results Compared with non-LPA pulmonary ADC patients, those with LPA exhibited unique clinicopathological features, including more elderly and female patients, smaller tumor size, lesser pleural invasion, lower histological grade and stage. Multivariate analyses showed that age, sex, marital status, primary tumor size, pleural invasion, histological grade, stage, primary tumor surgery, and chemotherapy were independently associated with overall survival (OS) and cancer-specic survival (CSS) in patients with LPA, while race was the only independent prognostic factor for OS, not for CSS. The nomograms showed good accuracy comparing with actual observed results and demonstrated improved prognostic capacity than TNM stage. Conclusions Patients with LPA are more common in older age and female. The smaller tumor size, lower histological grade and stage are the clinicopathological features of LPA which may indicate a good prognosis. The constructed nomograms accurately predict the long-term survival of LPA patients.


Introduction
Lung cancer is the leading cause of cancer death and the most commonly diagnosed cancer worldwide [1] . Lepidic adenocarcinoma (LPA), also known as lepidic predominant adenocarcinoma or nonmucinous bronchioloalveolar carcinoma [2] , is an infrequent subtype of lung adenocarcinoma (ADC) without precise incidence data. LPA is de ned as an ADC of > 3 cm in tumor size and/or has > 5 mm lymphatic, vascular, or pleural invasion with a nonmucinous lepidic predominant growth pattern [3] . The de nition was proposed by the International Association for the Study of Lung Cancer in 2011 and subsequently accepted by the World Health Organization (WHO) in 2015 [4] . LPA exhibits unique clinicopathological features, speci c gene mutation pro les, and desirable survival outcomes compared with lung adenocarcinoma, not otherwise speci ed (NOS) [4][5][6] . However, very few population-based studies have been completed on analysis of the demographic and clinicopathological features as well as the factors in uencing the prognosis of LPA. Meanwhile, it is quite challenging for clinicians to accurately predict the prognosis of patients only relying on tumor-node-metastasis (TNM) stage. Therefore, it is necessary to develop tools for estimating the probability of long-term survival in patients with LPA.
The Surveillance, Epidemiology, and End Results (SEER) database provides a wide range of demographic, clinical and follow-up information of cancer patients, which was established in 1973 and covered about 28% of the population in USA [7] . Using the SEER database, we retrospectively analyzed the clinicopathological features and survival of 4087 LPA patients to con rm their clinicopathological characteristics and prognostic factors. We then developed the nomograms estimating overall survival (OS) and cancer-speci c survival (CSS) for patients with LPA. Furthermore, we performed the internal and external validation as well as decision curve analysis (DCA) for nomograms to evaluate the accuracy of nomograms. Besides, we estimated the incidence of LPA and explored the risk factors associated with distant and lymph node metastases of LPA.

Data source and selection
Patient data were obtained from the SEER database using the SEER*stat software, version 8.3.8 (https://seer.cancer.gov/seerstat/). Lung adenocarcinoma was classi ed according to the 2015 WHO classi cation system. International Classi cation of Diseases for Oncology, third edition (ICD-O-3) histology code was used in this study to identify patients. The inclusion criteria were: (1) primary lung cancer; (2) the ICD-O-3 histology code was 8250/3 (lepidic adenocarcinoma) or 8140/3 (adenocarcinoma-NOS); (3) positive histological con rmation; (4) diagnosed between 2005 and 2014. The exclusion criteria were: (1) patients who had multiple primary tumors in their lifetime; (2) unknown survival data and TNM stage; (3) unknown important and easily accessible information in clinical practice including age at diagnosis, race, and marital status. The unique demographic and clinicopathological features were explored and compared between LPA group and ADC-NOS group. After propensity score matching (PSM), OS and CSS were compared between ADC-NOS and LPA patients. For creating and validating the nomograms, patients with LPA diagnosed in 2009 and 2010 (n = 979) were assigned into validation cohorts, and those diagnosed between 2005 and 2014, except for 2009 and 2010 (n = 3108) were assigned into training cohorts.

Study variables
Demographic and clinicopathological variables of the included patients were extracted, including age, sex, race, marital status, tumor location, primary tumor size, separate tumor nodules, pleural invasion, histological grade, 6th edition TNM stage, treatment, vital status, survival time, corresponding death causes, and the status of education and income in the county where patients resided in. In the present study, other race was recorded as "Other", except for the white and the black. "Married (including common law)" was recorded as "Married", other marital statuses were recorded as "Single". The status of education and income were de ned either as "Low" or "High", meaning that patients resided in counties with lower/higher education or income than median level. Considering there was no precise survival day data, the survival time of 0 months was recorded as 0.5 months. OS was de ned as the time period from the diagnosis to the death caused by any cause or last follow-up, while CSS was de ned as the time period from the diagnosis to the death caused by lung cancer.

Statistical analysis
For descriptive statistics, absolute numbers with proportion for variables was used. Chi-square test was used to compare the demographic and clinicopathological characteristics among different groups. A PSM method was used to minimize the impact of confounding factors. The propensity score for every patient with ADC-NOS or LPA was calculated with a logistic regression model, which included the following variables: age, sex, race, marital status, income and education level, primary tumor location and size, separate tumor nodules, pleural invasion, histological grade, TNM stage, and treatment. The caliper matching within a caliper of 0.02 was performed among two groups. With PSM analysis to exclude certain confounding factors, 4060 pairs of patients were successfully matched among patients included in our study. After PSM, OS and CSS were compared between patients with ADC-NOS and LPA by Kaplan-Meier curves and log-rank test. Then, the data of LPA patients were used for further analyses. Multivariate binary logistic regression analyses were performed to identify risk factors for distant and lymph node metastases in all LPA patients. Univariate and multivariate Cox proportional hazard models were performed to calculate the hazard ratios (HR) with 95% con dence intervals (CI) of variables associated with OS and CSS in the training cohort of LPA patients. Based on multivariate Cox analyses, nomograms were constructed and evaluated by the concordance index (C-index) and calibration curves, which were used for the comparison between the observed and nomogram-predicted survival outcomes. Ultimately, decision curve analysis (DCA) was performed to compare the prognostic capacity of the nomogram model and TNM stage. To verify the applicability of the nomogram model, nomograms were both internally and externally validated.
The ages of patients were strati ed by using the X-tile program (Yale University, USA) [8] . According to the cutoff value of age, in terms of OS, determined by X-tile analysis (Supplementary Figure S1A-C), the patients were divided into 3 groups (0-69, 70-79, and 80+ years old). All statistical analyses were performed using SPSS software version 21.0 (IBM Inc) or R version 3.6.1 (http://www.r-project.org/). Two-tailed value of P < 0.05 was considered to be statistically signi cant.

Patients and tumor characteristics
Of the 1,244,493 patients diagnosed with a primary lung or bronchus malignancy in the SEER database between 1975 and 2016, a total of 27,142 patients were diagnosed with LPA, which were accounted for 2.18% of all lung cancer. After applying the criteria of inclusion and exclusion, the number of patients with lung ADC-NOS and LPA enrolled in our study were 84267 and 4087, respectively. The demographic and clinicopathological characteristics of the eligible patients were shown in Table 1. Among the eligible patients, those with LPA were more common in older age, female, and yellow race. Besides, patients with LPA were more inclined to smaller tumor size, fewer separate tumor nodules, lesser pleural invasion, lower histological grade and stage.
After PSM, Kaplan-Meier curves were performed and showed that patients with LPA had better survival outcomes than those with ADC-NOS (Supplementary Figure S2A-B). Moreover, the median OS of the total 4060 LPA patients who were successfully matched by PSM was 49 months (Supplementary Figure S2A).

Factors associated with distant and lymph node metastases
As shown in Supplementary Table S1, the factors signi cantly associated with distant metastasis were identi ed by chi-square test and further examined by multivariate analysis, which showed that yellow race, the large size of the tumor, positive separate tumor nodule, and higher histological grade were the independent risk factors for distant metastasis. However, age, sex, race, tumor size, separate tumor nodule, pleural invasion, and histological grade were signi cantly associated with lymph node metastasis in the multivariate analysis (Supplementary Table S2).

Establishment of the nomograms predicting OS and CSS of LPA patients
In the training cohort, univariate analysis showed that age, sex, race, marital status, education, income, tumor location, primary tumor size, separate tumor nodule, pleural invasion, histological grade, TNM stage, primary tumor surgery, radiotherapy, and chemotherapy were signi cantly associated with OS (Table 2). Further multivariate analysis showed that age, sex, race, marital status, primary tumor size, pleural invasion, histological grade, TNM stage, primary tumor surgery, and chemotherapy were signi cantly associated with OS. While multivariate analysis identi ed that age, sex, marital status, primary tumor size, pleural invasion, histological grade, TNM stage, primary tumor surgery, and chemotherapy were signi cantly associated with CSS (Table 2). According to the multivariate results, two nomograms predicting the survival probability of 1-and 5-year OS ( Figure 1) and CSS (Figure 2) were constructed with these independent variables.
To use the nomograms, each variable was rst assigned to a speci c score by the point scale at the top of the nomograms. Based on the sum of those scores, the point scale at the bottom of nomograms was used to estimate the survival probability of one individual patient.

Nomogram validation
The internal and external validation for both OS and CSS nomograms were performed. The internal validation was performed via the training cohort, and the C-index values of the nomogram predicting OS and CSS were 0.786 (95% CI, 0.776-0.796) and 0.812 (95% CI, 0.802-0.822), respectively. The external validation was performed via the validation cohort, and the C-index values of the nomogram predicting OS and CSS were 0.781 (95% CI, 0.762-0.800) and 0.812 (95% CI, 0.793-0.831), respectively. Furthermore, internal and external calibration plots both indicated that the OS and CSS nomograms demonstrated excellent agreement between the predicted and actual survival outcome ( Figure 3A-H). Besides, the DCA results demonstrated that the nomograms showed better prognostic capacity than TNM stage (Supplementary Figure S3A-D).

Discussion
Concise and accurate prognostic prediction models for patients with malignancy are essential for clinical decisionmaking and scienti c research. Clinically, TNM stage is the most widely used survival predictor for cancer patients.
Therefore, identifying more prognostic factors and a more individualized model will certainly improve the accuracy of clinical outcome prediction. In this study, we used the SEER database, a large-scale population-based cancer registry program, to explore the clinical characteristics of 4087 patients with LPA and identi ed the factors associated with distant and lymph node metastases in LPA patients. After that, we developed and validated accurate and personalized prognostic nomograms predicting 1-and 5-year OS and CSS of patients with LPA. Moreover, the nomograms demonstrated better prognostic capacity than TNM stage.
Survival outcomes of LPA patients with poor prognostic factors were undesirable, the median OS of advanced LPA patients was 20.1 months [9] . However, the prognosis of advanced LPA patients could be improved by appropriate treatments, including chemotherapy and EGFR tyrosine kinase inhibitors (TKIs) [9] . The 5-year disease-free survival of LPA patients after complete surgical resection was about 90% [10] . With the evaluation of the nomograms that generated in our study, more aggressive treatments are recommended for high-risk patients with LPA, and appropriate shortening of the follow-up interval is encouraged to detect the occurrence of endpoint events as early as possible. For example, older, unmarried, black men with sizeable tumor and advanced TNM stage are recommended for frequent follow-up and more aggressive treatments, including primary tumor resection when they meet the operational criteria.
Compared with other rare histologic subtypes of lung cancer, such as papillary adenocarcinoma [11] and carcinosarcoma [12] , our results suggested the incidence of LPA was much higher. Our results indicated that female and older age were highly associated with LPA rather than ADC-NOS, which is consistent with previous studies [13,14] . In addition, some clinicopathological features of LPA patients indicate a good prognosis, including smaller tumor size, fewer separate tumor nodules, lesser pleural invasion, lower histological grade and stage. This is consistent with previous studies [15] and in line with the good prognosis of LPA [3,13,15] . Moreover, LPA have some characteristics differing from other histologic subtypes of invasive pulmonary ADC, such as being more common in non-smokers or light smokers, a preference for pulmonary peripheral location and be false-negative in positron-emission tomographic scan [13,16] . Clinically, asymptomatic at presentation and excessive airway secretion were more common in patients with LPA [17] . In the genetic alteration pro les, EGFR mutations were occurred in about 50% patients with LPA, which is signi cantly higher than other subtypes [5] , especially the mutations in exon 21 [17,18] . Whereas KRAS mutations were much less common that account for about 10% of LPA population [5] . Comparing with other histologic subtypes, a lower rate of ALK rearrangement and a higher rate of RET rearrangement were reported [6,19,20] .
Most studies supported that patients with LPA had desirable survival outcomes compared with other subtypes of invasive pulmonary ADC. In the treatment strategies, surgery is still the superior option for LPA patients whereas adjuvant chemotherapy including oral uoropyrimidines and platinum-based regimens conferred no survival bene t on patients with LPA, regardless of the tumor stage at presentation [21,22] . In patients with advanced LPA, studies suggested that taxane-based chemotherapy and pemetrexed might be effective and well-tolerated in LPA [23,24] . With higher frequencies of EGFR mutations, EGFR-TKIs as rst-or second-line treatment for advanced LPA demonstrated encouraging e cacy [9] . Nevertheless, due to the lower expression level of programmed cell death-ligand 1, the e cacy of immune checkpoint inhibitors in patients with LPA may be poor [25][26][27] . Moreover, multiple studies suggested that a higher percentage of lepidic growth patterns were associated with a lower risk of recurrence, and the invasive component size were better predictors of survival than overall tumor diameter [16,17,28,29] . Furthermore, no recurrence was observed in any of the 18 LPA patients with maximum tumor diameter > 3 cm but the maximum diameter of the invasive area < 5 mm [30] . Therefore, Suzuki et al. [30] proposed that LPA with an invasion of 5 mm or less can be regarded as minimally invasive ADC even if the tumor is larger than 3 cm in diameter. Unsurprisingly, our results suggested that primary tumor surgery was a major prognostic factors of LPA patients following the histological grade and stage. By contrast, chemotherapy was far less important to the prognosis of LPA patients.
Furthermore, our results suggested that radiotherapy had no signi cant effect on the survival outcomes of LPA patients. Regrettably, we could not explore the prognostic signi cance of chemotherapy regimens, targeted therapy, immunotherapy and the diameter of the invasive area.
In the current study, we identi ed that age, sex, marital status, primary tumor size, pleural invasion, histological grade, TNM stage, primary tumor surgery, and chemotherapy were independently associated with OS and CSS in patients with LPA. It is of note that few patients with histological grade IV LPA were included in this study. Therefore, the nomograms we constructed to predict the survival outcomes were not suitable for patients with histological grade IV LPA. Similar to previous studies, our results suggested that treatment, tumor size and some demographic characteristics also had an impact on the prognosis of LPA patients, and we provided a statistical prediction tool that can incorporate and quantify the selected prognostic factors to estimate the survival outcome for an individual patient.
To date, it is the rst time that the demographic and clinicopathological features, as well as the incidence of LPA, were elucidated based on a large-scale population-based database. Meanwhile, this is the rst nomograms predicting the survival outcomes in LPA patients, which could aid in the personalized prognostic evaluation and clinical decisionmaking. However, there were still some limitations in our study though the nomograms demonstrated good accuracy and applicability. First, the nomograms were constructed based on the retrospective data, and the prospective validation is needed. Second, some critical information such as the diameter of the invasive area in LPA, tumor biomarkers, chemotherapy regimens, targeted therapy, molecular pathology, and genetic tests were absent in the database. Therefore, we could not analyze those variables and improve prognostic nomograms in our study. Third, the patients were almost all Americans, and the results might be different in other races. Such drawbacks are inherent to almost all retrospective population-based studies. However, the large size and the long follow-up duration of the present study compensate to a great extent and provide a comprehensive knowledge of LPA. Further prospective studies with more important information are needed for model improvement and independent validation.