Risk Models Based On Populations In High-Risk Areas of Oesophageal Cancer In China For Improving The Oesophageal Cancer Diagnosis Rate


 Background: To improve the diagnostic efficiency of early oesophageal cancer, it is of great significance to develop an effective risk prediction model. This study aimed to identify a high-risk population with oesophageal squamous cell carcinoma (ESCC) based on a population screening model.Methods: From 120 target townships randomly selected from 150 villages selected in Nanchong City, Sichuan Province, China, from Jan 2016 to Sep 2019, a total of 6409 subjects were screened. Each patient underwent standard endoscopy and narrow band imaging (NBI) and iodine staining indicator biopsies to evaluate oesophageal cancer and precancerous lesions. Before endoscopy, the subjects completed a questionnaire about ESCC risk factors. Variables were evaluated by univariate analysis, and variables significantly related to ESCC were extracted by using a logistic regression model. We used the Akaike information criterion to develop the final model structure and the coding form of variables with multiple metrics. We developed two sets of models to define severe dysplasia and above (SDA) and moderate dysplasia and above (MDA) as prognostic events, respectively. Results: The areas under the receiver operating characteristic curve (AUROC) were0.896 (95%CI, 0.888-0.903) and 0.825 (95% CI, 0.816-0.835) for our SDA and MDA models, respectively. MDA-related and SDA-related factors included age, sex, cigarette smoking, alcohol drinking, pharyngeal foreign body sensation, swallowing obstruction, pain behind the sternum, and discomfort behind the breastbone.Conclusions: we developed an easy-to-use model to identify individuals with high risk of dysplasia or oesophageal cancer in high-risk areas of oesophageal cancer in China.


Introduction
Oesophageal cancer (EC) is one of the most common malignant tumours [1][2][3] with a poor prognosis and low survival rate [4]. Its incidence is ranked eighth in the world for malignant tumours, and it is the fourth most common type of cancer in China [5][6][7]. When EC was found, most patients were in the stage of progression and lost the best opportunity for treatment. The 5-year survival rate was less than 25% [8]. Early diagnosis of oesophageal squamous cell carcinoma (ESCC) and precancerous lesions can not only reduce mortality and morbidity, but also improve patients' prognosis [8][9][10]. It is extremely important to study the early diagnosis of early EC and its precancerous lesions, especially for the screening of high-risk populations in areas with a high incidence of EC [11].
Endoscopy can diagnose EC early and increase the proportion of curable cases of oesophageal malignant tumours by up to 30% [12][13][14]. However, the cost of endoscopy and the effective improvement of cancer patients' survival are uncertain [15,16]. With increased awareness of people's health and active education and management of diseases in high-incidence areas of EC, gastrointestinal endoscopy has become more common [17].China has a large population, and endoscopy screening is not cost-effective and feasible even in areas with a high incidence of EC.
Using a risk prediction model to distinguish high-risk individuals in the general population will be an effective strategy for diagnosing ESCC. At present, there are many studies based on upper gastrointestinal tumour prediction [18,19].Previous prediction models are often based on hospital case-control studies, which cannot be based on large-scale population prediction models and have certain limitations for the prediction of individual risks, and the screening model is based on the upper digestive tract (stomach and oesophagus) [20][21][22].The screening model cannot satisfy the individual diagnosis of Chinese patients with EC. Therefore, in order to improve the diagnostic e ciency of early EC, it is of great signi cance to develop an effective risk prediction model. The development of an effective EC risk prediction model must be based on risk factors closely related to the occurrence and development of EC. The current research shows that the risk factors are mainly age, poor nutritional status, low intake of fruits and vegetables [23], family history of EC [24], high-temperature food intake [25], chemical carcinogen exposure[26], cigarette smoking and alcohol consumption [27], and human papilloma virus infection [28]. However, the prediction model of EC constructed with these risk factors is not effective. Studies have shown that the prediction model based on the symptoms of tumour patients has high e ciency. Therefore, we analysed the symptoms of patients with the aim of developing an EC risk prediction model mainly through a large-scale endoscopic screening test. The goal was to develop a method that maximises the use of all information on age, warning symptoms, and life characteristics and is better applied to the diagnosis of ESCC.

Study design and population
The research design and study criteria are shown in Figure 1.In January 2016, we launched the China Endoscopic Oesophageal Cancer Screening trial in towns and villages in Nanchong, Sichuan Province, China to evaluate and identify high-risk groups of upper gastrointestinal cancer and high-risk groups that should undergo endoscopy. One hundred twenty target towns were randomly selected, and 6409 subjects were screened. Each patient underwent standard endoscopy and narrow band imaging (NBI) and iodine staining indicator biopsy to evaluate for EC and precancerous lesions. Screening and early diagnosis were conducted for high-risk groups of upper digestive tract cancer. Endoscopic minimally invasive treatment of precancerous lesions and early cancers that could be intervened was performed found in time. Inclusion criteria included (1) age ≥40 years, male or female, (2) high incidence of upper gastrointestinal cancer but no gastroscopy in the past 5 years, (3) no gastroscopy contraindications such as severe cardiopulmonary dysfunction, (4) No history of diagnosis and surgery for upper gastrointestinal tumors, and (5)willingness to complete this study. Our prediction model was based on 6409 patients in the screening department from January 2016 to September 2019. Examples of endoscopy were constructed and veri ed.

Samples and data collection
All patients underwent endoscopy and were requested to stop use of proton pump inhibitors or H2 receptor antagonists for at least 2 weeks before the examination to avoid disturbing the gastrointestinal malignancies [29].Endoscopy was performed with ordinary white light, NBI, and Lugol iodine staining. The oesophagus and stomach were examined, and biopsies from all suspected lesions were obtained. If no abnormality was found, a pathological biopsy of the middle of the oesophagus was performed (28-33cm from the incisors at the 6 o'clock position), and then the biopsy tissue was xed in 10% formaldehyde, embed in para n, cut into 5-mm thick sections, and stained with hematoxylin-eosin [19].
Two pathologists from North Sichuan Medical College independently diagnosed the pathological section, and to avoid bias, the pathologists were not aware of the patients' medical history, endoscopy results, pathological outcomes, and predictive factor outcomes. Diagnoses included normal, oesophagitis, basal cell hyperplasia, moderate dysplasia and above (MDA),severe dysplasia and above (SDA),carcinoma in situ (CIS), squamous cell CIS, invasive squamous cell carcinoma, and adenocarcinoma Before endoscopy, the patients completed a questionnaire about ESCC risk factors. The questionnaire included age, body mass index (BMI),cigarette smoking status, alcohol drinking status, high-temperature food intake, preserved food intake, pharyngeal foreign body sensation, swallowing obstruction, pain behind the sternum, discomfort behind the breastbone, and family history of ESCC(Supplementary materials).

Structure determination
First, unconditional univariate logistic regression analysis was performed using the above observation factors for preliminary screening, followed by multivariate logistic regression modelling and unconditional logistic regression analysis to calculate the odds ratio (OR) and 95% con dence interval (CI) of each factor to establish a regression model. The nal model structure of the variables was determined using the Akaike information criterion (AIC). Interactions were tested exhaustively in a multivariate model, and no statistically signi cant differences were found. Because of the strict logical inspection of our survey system, there were no missing data for the variables in the questionnaire.

Model development and performance testing
We developed two sets of models in this study, one for SDA (including lesions of severe dysplasia and higher grade lesions) and the other for MDA (including moderate dysplasia and higher grade lesions), de ned as the ending event [30]In the SDA model, if diagnosed with severe dysplasia, CIS, or ESCC, the patients were classi ed as case groups. In the MDA model, moderate dysplasia was also regarded as a result event. The discriminant ability between the multivariate prediction model and the simple age model was compared, and the area under the receiver operating characteristic curve (AUC) and the DeLong test were used to evaluate the performance of these models [31].Leave-one-out cross-validation was adopted to evaluate the generalisation error based on the predicted probabilities of each patient from models built on all the remaining patients.
The data were statistically analysed using SAS9.4 statistical software (SAS Institute), and MDA and SDA were analysed separately. Normally distributed measurement data are expressed as mean± standard deviation (X ̅ ±S) according to the Student t-test. The measurement data of skewed distribution are expressed as median and quartile M (25 th quartile, 75 th quartile)according to the Mann-Whitney U test. The classi cation data are expressed as the number of cases and percentile (n (%))according to the chi-square test or Fisher exact probability method. Meaningful variables were incorporated from single-factor analysis into multivariate logistic regression to explore the risk factors of EC. Medcalc software (Medcalc) was used to draw the receiver operating characteristic (ROC) curve, and R software (The R Foundation)was used to develop the nomogram. P<0.05 was considered statistically signi cant.

Ethics Statement
This study was approved by the ethics committee of the A liated Hospital of North Sichuan Medical College. Written informed consent was obtained from the patients involved in this study or their close relatives. The research data were analysed anonymously, and personal identi ers were removed.

Results
Patients' characteristics Table1 shows patients' demographic characteristics. From 2016 to 2019, 6409 patients were included, and the average age was59.70±7.94 years (maximum age, 78 years; minimum age, 40 years). Because of medical restrictions in China, patients' family history of ESCC was unknown and was not included in the data.

Model development and validation
Age, cigarette smoking, alcohol drinking, pharyngeal foreign body sensation, swallowing obstruction, pain behind the sternum, and discomfort behind the breastbone were included into multivariate logistic regression (stepwise regression method). The results showed that age, cigarette smoking, alcohol drinking, pharyngeal foreign body sensation, swallowing obstruction, pain behind the sternum, and discomfort behind the breastbone were risk factors of SDA. According to AIC, we rst constructed the SDA model for the joint prediction model and compared it with the simple age model and simple BMI model. The ROC curve showed that age plays a major role in predicting high-grade oesophageal lesions, and by including all other risk factors and alarm symptoms, the prediction model was signi cantly better than other simple predictors (AUC FULL (Figure 2). The sensitivity of the joint prediction model (83.5%) was higher than the sensitivities of the simple age model (76%) and simple BMI model In addition to the SDA model, we constructed a model to predict MDA lesions in this population. Age, sex, cigarette smoking, alcohol drinking, pharyngeal foreign body sensation, swallowing obstruction, pain behind the sternum, and discomfort behind the breastbone were also risk factors of MDA (Tables S1 and S2).In the SDA and MDA models, pharyngeal foreign body sensation, swallowing obstruction, pain behind the sternum, and discomfort behind the breastbone had a high correlation with SDA and MDA.
We tried to determine whether the SDA model could also distinguish patients with moderate dysplasia, and the results showed that its ability to predict moderate dysplasia was moderate ( Figure S1A, S1B).We developed a nomogram based on the structure of our model (Figure 4).By using this nomogram, patients can quickly determine the risk rate of oesophageal squamous cell carcinoma according to their own level of risk factors, and it can improve the detection rate of screening and remind patients with risk factors of EC to correct bad habits (e.g., smoking and drinking).

Discussion
We developed a risk prediction model of ESCC and its precancerous lesions that has the potential to be applied in secondary prevention projects. The can help patients make preliminary judgements, and it can improve the detection rate of ESCC and precancerous lesions. Our results provide important information for further clarifying the impact of clinical symptoms on early warning and prognosis of EC.
Kunzmannn et al. [3] established a predictive model of oesophageal adenocarcinoma based on life risk factors through a prospective study. Age, sex, smoking, BMI, and history of oesophageal treatment were included in the nal model (AUC = 0.80, 95%CI 0.77-0.82) used to predict patients with a higher absolute risk of oesophageal adenocarcinoma within 5 years. Ireland et al. [32] compared 120 Barrett oesophagus patients with 235 healthy individuals by age, sex, history of re ux, family history of re ux, history of hypertension, weekly alcoholic beverages, and BMI, and found signi cant differences; they established a Barrett oesophagus risk prediction model for the Australian population. The model's AUC was 0.82(95% CI 0.78-0.87), showing good discrimination.
In this study, the SDA and MDA models included the same risk factors age, cigarette smoking, alcohol drinking, pharyngeal foreign body sensation, swallowing obstruction, pain behind the sternum, and discomfort behind the breastbone. The MDA model's AUC indicated that it had high discrimination ability. Other prediction models have been based on patients' living habits and living environment as predictive factors, such as pesticide exposure and use coal or wood as the main cooking fuel [30]. During the questionnaire, subjects could not accurately select answers, and because of economic and medical restrictions in China, the prevalence of a family history of EC in patients older than60 years of age cannot be accurately informed, making it di cult to apply prediction models. Our research model predictors can accurately re ect the patients and be used easily in screening practise. Compared with the hospital-based case-control design model, the main advantage of this study is that our prediction model was based on a large population-level screening procedure. Patients who participated in the screening were randomly selected from Nanchong City, Sichuan Province, a high-incidence area of EC. They were able to describe the distribution of EC and precancerous lesions in the general population. We used screening cases including severe dysplasia, CIS, and ESCC as the result of model construction. Early malignant lesions are the target of ESCC screening work [33]. Compared with the model established by the casecontrol study with advanced ESCC as the outcome event, our model is more suitable for early detection of ESCC and its malignant lesions.
The present study showed that SDA and MDA are signi cantly related to smoking and drinking. Compared with non-smokers, smokers have a signi cantly increased risk of SDA and MDA, which is consistent with ndings of previous case-control and cohort studies [34,35]. Drinking alcohol produces carcinogenic metabolites, and heavy drinking especially increases the accumulation of such metabolites in the body, which in turn increases the risk of ESCC [36, 37].
The characteristics and criteria of early symptoms of EC and further clarifying the relationship between these symptoms and the course and pathological characteristics of EC are key to determining whether these symptoms can be used for early warning of EC. This strategy is also one of the most effective, low-cost, noninvasive, and easily accepted and popularised EC prevention and control methods in the absence of effective molecular markers for early warning and diagnosis of EC. Western literature reported that the main early warning symptoms of oesophageal adenocarcinoma are swallowing/choking [38].Current research on symptoms and tumours, lack of symptoms, and lack of awareness of the risk of alarm symptoms are risk factors for poor tumour prognosis [2]. The most common presenting symptoms of EC are dysphagia and weight loss. Other symptoms include odynophagia, upper gastrointestinal bleeding, hoarseness, and respiratory symptoms [39].Based on large-scale screening tests, we summarised the warming symptoms of early oesophageal tumours. Although the value of each alarm symptom was relatively low, it may be possible to avoid unnecessary endoscopy by using various symptoms combined with patient risk factors to make a comprehensive judgement.
Some previous studies have also evaluated the value of age and warning symptoms in predicting cancer risk in patients with dyspepsia [20][21][22].These studies used age and warming symptoms to predict upper gastrointestinal cancer. Bai and colleagues studied the predictive value of alarm symptoms and age for upper gastrointestinal malignancies in China, and found that age or any alarm symptoms have limited value. In their study, alarm symptoms were highly speci c, but the sensitivity was low. However, most of their discussions were based on the positive diagnostic likelihood ratio for each symptom, and they did not use all predictors to build models to predict the risk limits of upper gastrointestinal malignancies [21], Fransen [40] and colleagues conducted a meta-analysis and found a limited diagnostic value for each alarm symptom. They recommended that the alarm symptoms may be related to other factors. Combined use may be a better tool for selecting highrisk patients. However, they could not test their hypothesis. Numans and colleagues [20] used the calculated total score to establish a risk prediction model and showed that the classic warning symptoms through the risk prediction model are useful predictors of upper gastrointestinal malignancies. However, their model is somewhat complicated and contains multiple variables, so it is somewhat unstable. The above studies all predicted upper gastrointestinal tumours, which is not practical for high incidence areas of EC in China. In our risk prediction model, various symptoms and patients' life habits can be used to diagnose EC.
Disease risk prediction models can provide personalised estimates of ESCC risk based on personal baseline data [41]. The use of risk prediction models to distinguish high-risk groups from the general population will help formulate highly effective ESCC prevention and treatment strategies [42]. The rst method of using this model is that when the resources for EC are high and endoscopic screening is limited, it is urgent to identify more patients with early malignant lesions, such as the national population with ESCC in some high-risk areas of EC in China's screening plan [43]. A nomogram can be used as a predictor to assess the risk > 0.8, which can maximise the detection rate of malignant lesions. Compared with general screening, the detection rate of SDA and MDA lesions will increase. The second method is under the conditions of unlimited resources, such as the environment that usually exists in clinical and scienti c research. Therefore, it is possible to select subjects with a risk > 0.5 for screening to ensure high sensitivity.
This study has a couple limitations. Although our study screened more than 6409 participants, the study population was still limited, and the screening population was divided according to whether patients participated in screening, causing biased selection. Multi-centre external veri cation and calibration are required in future studies. Because of regional differences in EC, validation studies conducted in other populations are also crucial for the generalisation of the model. Prospective collection of more malignant events and dynamic observation of exposure to predictive factors are also essential to improving these models. In conclusions, our EC high-risk population screening model constructed based on screening population information has good discriminatory and calibration capabilities, and it can be used to assess the risk of local EC. Screening high-risk populations may make EC screening more cost-effective. Additionally, the model can be applied to other Asian populations with similar socioeconomic or lifestyle characteristics, who may also bene t from our ESCC risk prediction model. Abbreviations AIC, Akaike information criterion; AUC, area under the receiver operating characteristic curve; BMI, body mass index; OR, odds ratio; CI, con dence interval; CIS, carcinoma in situ; ESCC, oesophageal squamous cell carcinoma; MDA, moderate dysplasia and above; ROC, receiver operating characteristic; SDA, severe dysplasia and above; NBI, narrow band imaging.    Research ow chart MDA, moderate dysplasia and above; SDA, severe dysplasia and above; AUC, area under the receiver operating characteristic curve Comparison of the receiver operating characteristic curves between the SDA joint prediction model and simple age and simple BMI prediction models. The performance of the SDA prediction model for 6409 early cancer screenings in rural areas of Nanchong, Sichuan Province, China from 2016 to 2019 is shown. SDA, severe dysplasia; BMI, body mass index; AUC, area under the receiver operating characteristic curve; CI, con dence interval Figure 3 Comparison of the joint prediction model and simple age predication model (leave-one-out cross-validation).

Figure 4
Nomogram for severe dysplasia

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.