Development and Validation of a Nomogram Predicting The Overall Survival for Patients With Primary Gastric Mucosa‐Associated Lymphoid Tissue Lymphoma: A SEER-Based Study


 Purpose: To create an effective survival nomogram for patients with primary gastric mucosa‐associated lymphoid tissue (MALT) lymphoma (GML).Methods. All data of patients with primary GML from 2004 to 2015 were collected from the Surveillance, Epidemiology and End Results (SEER) database. The primary endpoint was overall survival (OS). Based on the LASSO and COX regression, we created and further verified the accuracy and effectiveness of the survival nomogram model by the concordance index (C-index), calibration curve and time-dependent receiver operating characteristic (td-ROC) curves.Results. A total of 2604 patients diagnosed with primary GML were selected for this study. A total of 1823 and 781 people were randomly distributed into the training and testing sets at a ratio of 7:3. The median follow-up of all patients was 71 months, and the 3- and 5-year OS rates were 87.2% and 79.8%, respectively. Age, sex, race, Ann Arbor stage and radiation were independent risk factors for OS of primary GML (all p<0.05). The C-index values of the nomogram were 0.751 (95% CI: 0.729-0.773) and 0.718 (95% CI: 0.680-0.757) in the training and testing cohorts, respectively, showing the good discrimination ability of the nomogram model. Td-ROC curves and calibration plots also indicated satisfactory predictive power and good agreement of the model. Overall, the nomogram shows favorable performance in discriminating and predicting the OS of patients with primary GML.Conclusions. A nomogram was developed and validated to have good survival predictive performance based on five clinical independent risk factors for OS for patients with primary GML. Nomograms are a low-cost and convenient clinical tool in assessing individualized prognosis and treatment for patients with primary GML.


Introduction
Increasingly extranodal marginal B-cell lymphoma of mucosa-associated lymphoid tissue, known as MALT lymphoma, is a type of non-Hodgkin's lymphoma [1,2]. Approximately one-third of cases of MALT lymphoma and 85% of gastrointestinal MALT lymphomas present in the stomach as the affected site [3,4]. Primary gastric mucosa-associated lymphoid tissue (MALT) lymphoma (GML) has a rare incidence and only accounts for approximately 5% of all primary gastric neoplasms [5].
This disease demands high attention. Researchers have observed that GML patients had signi cant risk of atrophic gastritis, intestinal metaplasia [6,7] and secondary tumors [8]. Compared to the healthy population, the incidence of gastric adenocarcinoma in GML patients was 6 times higher in the GML population [9,10].
The prognosis of primary GML patients can be affected by many factors. Clinical risk factors, including age, type of therapy, sex, stage and family hematologic malignancy history, also have signi cant effects on the development of the disease [11][12][13][14]. Meanwhile, previous studies demonstrated a good prognosis of the disease, with 5-year survival rates of up to 99%. However, more than 95% of studies were only based on stage I/II patients with a small sample size, and few of them even used different staging standards [15][16][17][18][19]. Matysiak-Budnik, T et al. [11] conducted a multicenter study in France, and 416 GML patients were retrospectively enrolled. They surprisingly found that 25% of subjects diagnosed at stage III/IV and 11% of patients obtained missed or false diagnoses, which was similar to other studies [20][21][22].
Furthermore, the available data are mainly focused on epidemiology; in contrast, few studies have investigated the prognostic variables for overall survival (OS) in patients with primary GML. Based on the realities above, we searched a large amount of data on patients diagnosed with primary GML in the SEER database. The aim was to develop and verify a survival nomogram model that can predict the OS prognosis of primary GML by combining prognostic and determinant variables.

Patient selection and data extraction from the SEER database
The data for our study were extracted from the SEER database (Username: 12262-November 2019, software version: SEER * Stat 8.3.6). Due to the openness of the database, we did not need to obtain approval from the Ethics Committee.
All patients diagnosed with primary GML from 2004 to 2015 were ultimately included in this study.
Subjects meeting the following conditions were excluded: (1) hospitalized death and autopsy source patients; (2) patients with tumor history; (3) patients who were not followed up or who were lost follow-up; (4) age at diagnosis < 20; (5) unknown data (race, stage, cause of death); and (6) survival months < 1.
A total of 2604 selected GML patients were randomly assigned to the training and validation sets with a ratio of 7:3. There were 1823 and 781 people in the two groups, respectively.
The clinical covariates included sex, age, race, primary site, Ann Abor stage, surgery, chemotherapy, and radiation. Data on the survival month and vital status of patients were also analyzed. The primary endpoint was OS.

Establishment And Veri cation Of The Survival Nomogram
In the training set, the least absolute shrinkage and selection operator (LASSO) and multivariate Cox proportional hazard regression were combined to identify the signi cantly correlated prognostic factors that in uenced OS. The nomogram model was established based on the above results. Meanwhile, primary GML patients were divided into low-and high-risk groups at the cutoff point of the risk score.
Scatter plots, forest plots and Kaplan-Meier curves were generated to visually compare the OS times of patients in the two different risk groups.
Internal validation was performed on the patients in the validation set. The discriminatory performance of the model was measured by the concordance index (C-index) value [23]. We also adopted time-dependent receiver operating characteristic (td-ROC) curves to assess the 3-and 5-year OS predictive power of the nomogram. Additionally, calibration plots were applied to compare the agreement between actual and predicted probability.

Clinical characteristics of subjects
The speci c clinical and pathological characteristics of all enrolled patients and the training and testing groups are presented in Table 1.
To establish the nomogram model, 1823 subjects were randomly assigned to the training cohort, while 781 were assigned to the validation cohort. No signi cant difference in variables was observed between the two groups (all P > 0.05).

Multivariate Risk Factor Analysis And Establishment Of The Nomogram
In the training set, the LASSO Cox regression model was adopted to lter risk factors for OS. The results revealed that 5 out of 8 factors were signi cantly associated with 3-and 5-year OS, and the speci c information is listed in Fig. 1 and Table 2. Increased age, male sex, black race and higher disease stage were inversely correlated with survival, while radiation treatment showed a positive correlation with survival (all P < 0.05). Based on the ve variables above, we established an e cient survival nomogram to precisely calculate the probability of 3-and 5-year OS in primary GML patients (Fig. 2). In the nomogram, the C-index value was 0.751 (95% CI: 0.729-0.773) and demonstrated satisfactory discrimination ability. Patients were separated into high-and low-risk groups according to the median risk score (cutoff: 0.28, Fig. 3A). A scatter plot (Fig. 3B) visually showed a shorter survival time and a higher mortality rate of high-risk patients. The Kaplan-Meier curve (Fig. 5B) clearly revealed that, compared to the high-risk population, more low-risk patients had a better OS (all P < 0.0001, HR 0.190, 95% CI 0.153-0.236), which is consistent with the results shown in Fig. 4 (all HR > 2). Td-ROC curves also showed good predictive power of the nomogram assessing the 3-and 5-year OS. The areas under the curve (AUCs) were 0.727 (Fig. 6A) and 0.734 (Fig. 6B), respectively. Moreover, another analysis, the calibration curves of the model, shown in Fig. 7A-B, con rmed a high agreement between actual and predictive survival proportions. All the above results indicate the good discrimination and predictive capacities of the model.

Internal Validation In The Validation Set
To better validate the nomogram model, we carried out relevant analysis in the validation set. The C-index of validation patients was 0.718 (95% CI: 0.680-0.757), indicating good predictive accuracy of the nomogram model. There were signi cant survival differences between the low-and high-risk groups on Kaplan-Meier curves (HR 0.233, 95% CI 0.174-0.312, P < 0.0001, Fig. 5C). As shown in Fig. 6C-D, the AUC values of the 3-and 5-year OS td-ROC curves were 0.689 and 0.715, respectively, which were similar to the results in the training set, further con rming the reliable predictive ability of the model. The same phenomenon was also observed between the calibration plots in the training and validation groups (Fig. 7C-D). In conclusion, the survival nomogram model displayed a favorable performance to discriminate and predict the OS of primary GML patients in 2 sets.

Discussion
Primary GML is con rmed as a low-grade, rare incidence rate lesion, and the main risk for the disease is a histological transformation to diffuse large B-cell lymphoma [19,25]. Studies mostly focus on epidemiology [26,27] and the prognosis affected by different treatment methods of the disease [14,22,28,29]. Few large studies have reported the relationship between clinical variables and prognostic survival in primary GML. Until now, no available survival model has been established for predicting the prognosis of patients with primary GML. Our study successfully constructed and validated an effective nomogram model to predict the overall survival of the disease based on clinical and pathological risk factor analysis. We demonstrated that age, gender, race, Ann Arbor stage and radiation therapy were independent risk factors for OS of primary GML. The nomogram was proven to have good predictive ability for disease prognosis.
Considering the indolent natural development of the disease, long-term and very large follow-up clinical datasets are needed. In our study, all of the data were obtained from the SEER database. This is a national cancer database gathering a large amount of data from different hospitals in the United States, and patient information is strictly managed and reliable [30]. Meanwhile, we developed and validated a survival nomogram. In recent years, nomograms have been widely used as an effective tool for cancer prognosis [31][32][33]. Tailored to the speci c information of every patient, the nomogram can visually analyze and present the disease event (such as OS) with a single numerical probability [34]. In summary, on the basis of high-quality data and validation analysis methods, our model has good clinical prediction ability and can be applied to clinical work.
The 3-and 5-year OS rates calculated in our study were 87.2% and 79.8%, respectively. In 2019, the rst national study was conducted on the general population in France. They con rmed that the 5-year OS of all populations was 79% (95% CI [75-83]) [11]. This rate is similar to published studies and may re ect a better prognostic outcome of GML disease than other gastric malignancies.
In 2017, Thieblemont et al. [35] generated a novel MALT lymphoma prognostic index (MALT-IPI), including age ≥ 70 years, elevated LDH levels and Ann Arbor stage III or IV. They concluded that this index would be an effective method to predict poor outcome for MALT lymphoma. A similar conclusion was found in other studies [11,14,19,29]. T. Matysiak-Budnik et al. [11] conducted a multiple retrospective study in French, including 416 cases of GML. They found that 5-year OS was better for patients < 67 y old (93.6%) than for those with an older age (93.6% vs 68.5%, P < 0.0001). Another multicenter cohort follow-up study of 420 patients found that age (each incremental year) was an independent prognostic factor for OS (P = 0.024) [14]. In our study, age had the highest risk and showed a signi cant correlation with the prognosis of primary GML (P < 0.0001, HR 19.843). The nomogram obviously indicated that increased age, especially > 65 y old, had a negative impact on the OS of the disease. Although several researchers found no association between them [20,29], an insu cient number of subjects in these studies need to be considered. Therefore, we still insist that age and prognosis are closely related. Regardless of the speci c cutoff point of age, it is recognized that advanced age means an increased risk of primary GML [13].
Multiple analysis also indicated that Ann Arbor stage was a signi cant independent prognostic factor for the disease. In our study, most patients presented with stage I-II disease (89.9%), and as the severity of the disease increased, the prognosis of the disease worsened (all P < 0.05). The proportion of patients with localized and advanced disease at diagnosis varies among reported series [11,14,36], and we have a consensus that the prognosis of these patients is different [13].
The percentage of female patients was higher than that of male patients (52.0% vs 48.0%), and more people were white (79.07%). Statistically signi cant correlations were found between male sex (P < 0.001), black race (P = 0.036) and primary GML overall survival. Whether sex is related to the disease remains controversial. Some studies have reported that males have a 2-3 times higher incidence rate of development and a worse prognosis than females [37,38], while other studies have not [11,35]. Until now, no study has focused on the relationship between race and primary GML. SEER provided us with detailed race data and indicated that black individuals are more likely to develop primary GML (HR 1.374, 95% CI 1.020-1.779). We still need more studies to further investigate prognostic factors in primary GML.
Management and treatment guidelines for MALT lymphomas have been extremely heterogeneous until the last few years. Over the past 2-3 decades, eradiation of H. pylori has been the preferred choice for GML regardless of the histological status of H. pylori [21,39]. Chemotherapy and radiation therapy (RT) were only suggested to be second-line therapies for nonresponders or advanced patients [39]. In contrast, ESMO guidelines suggested that RT might be the rst option for GML patients with localized stages, and chemotherapy was an effective method in patients with all stages. [40] Radiation therapy alone was also reported to have excellent treatment effects on GML with a total dose of 24-30 Gy [29,41]. Compared with other therapy methods, surgery showed no advantage over treatments in other trials [42]. In our study, no survival difference was found between patients with medication and surgical treatment. Multivariate analysis showed that only RT was signi cantly associated with better disease prognosis (P < 0.001). These data are consistent with previous studies of RT for gastric MALT lymphoma. In total, radiotherapy is a good choice for primary GML disease.
The limitations of this study are very obvious. First, this is a retrospective analysis. All information is from 2004-2015, and part of the data was recorded before the publication of guidelines for primary GML. This may cause heterogeneity in the clinical management of patients, resulting in data bias. Second, SEER cannot provide us with speci c details of treatment methods. Some data even show unknown labels. Third, we have no external veri cation data. We collected a few cases of primary GML, but they are not worth analyzing considering analysis errors. More multiple, prospective datasets of primary GML are still necessary for further investigation.

Conclusion
In conclusion, a nomogram was developed and validated to have good survival predictive performance based on ve clinical independent risk factors for OS for primary GML patients. Nomograms are a lowcost and convenient clinical tool in assessing individualized prognosis and treatment for patients with primary GML.

Abbreviations
MALT: gastric mucosa-associated lymphoid tissue; GML: gastric mucosa-associated lymphoid tissue lymphoma; SEER: Epidemiology and End Results; OS: overall survival; LASSO: the least absolute shrinkage and selection operator; C-index: concordance index; td-ROC: time-dependent receiver operating characteristic; AUC: areas under the curve; RT: radiation therapy.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
All data generated or analysed during this study are included in this published article [and its supplementary information les].

Competing interests
All authors declare no con ict of interest in this study.