Estimation of The Transitional Probability Among Esophageal Cancer and Its Precancerous Lesions

Background: To estimate the transition probabilities of esophageal cancer (EC) and its precancerous lesions during the process of canceration by Markov model, which could provide important information for EC screening with regard to choosing reasonable screening and follow-up intervals. Methods: The transition probabilities among pathological stages were estimated by establishing Markov models for the natural history of EC and repeatedly adjusting and calibrating Markov models through comparing the modeled incidence and distributions of pathological stages (alone or combined) with observed data in real world condition. Results: In one year, the probabilities were 0.024, 0.05, 0.12 for people from health state progressing to mild dysplasia (mD), mild dysplasia (mD) to moderate dysplasia (MD), and moderate dysplasia (MD) to severe dysplasia/carcinoma in situ (SD/CIS), respectively. The age-specic transition probabilities were 0.08~0.18 for severe dysplasia/carcinoma in situ (SD/CIS) progressing to intramucosal carcinoma(IC), 0.4~0.87 for intramucosal carcinoma (IC) to submucosal carcinoma (T 1 N 0 M 0 ) (SC), and 0.2~0.85 for submucosal carcinoma (T 1 N 0 M 0 ) (SC) to invasive carcinoma (INC). The progression probabilities increased with age and the severity of the disease. Based on the estimated transition probabilities, we predicted the incidence of EC and distributions of its pathological stages. Comparisons between modeled results with observed data conrmed the validation of our transition probabilities. Conclusion: The estimating transition probabilities of EC and its precancerous lesions were reliable and could be used to address questions such as the optimal screening frequency, screening intervals, and health economic evaluation of screening strategies.


Introduction
Esophageal cancer (EC) is one of the tumors with high morbidity and mortality. According to the International Cancer Registry, there were an estimated 572,000 new cases and 509,000 deaths worldwide in 2018, ranking seventh and sixth in terms of incidence and death of malignant tumors, respectively [1]. China is one of the countries with the highest incidence of EC. According to the latest data released by the National Cancer Center, the incidence of EC in China was 17.87/100,000 and the death rate was 13.68/100,000 in 2015, ranking the sixth and fourth in the incidence and death of malignant tumors respectively [2]. EC affects a wide range of areas in China, covering about 200 million people, mainly in Taihang mountain areas, called the high risk areas, the mortality of EC reaches 68.3/100 000 [3], far more than the national level of 13.68/100 000 [2].
More than 95% of the patients treated for EC in the high-incidence areas were in the middle and advanced stages, and the 5-year survival rate of the total cases was less than 10%. However, if early detection, early diagnosis and early treatment can be achieved, the 5-year survival rate of early EC can be greater than 95% [4]. Accordingly, early detection and treatment through screening programs can reduce the threaten of EC to population in high risk areas [5].
It is known that choosing reasonable screening intervals plays an important role in EC screening. And screening intervals are highly associated with the transition probabilities from one health status to another during esophageal carcinogenesis. There have been some follow-up studies in small cohorts of patients with precancerous lesions, and test-retest of endoscopy researches in a short period. They could provide some transition probabilities for parts of stages in the process of EC development. However, the variations of transition probabilities obtained from small populations were signi cant among studies. Two-fold differences were found in the transition probability of mild dysplasia (mD) progressing to moderate dysplasia (MD) in previous studies [6,7]. And even some probabilities were not available such as the progress probability from intramucosal carcinoma (IC) to submucosal carcinoma (SC) [8]. Thus, the applicability of previous transition probabilities is limited.
As far as we know, large-population-based perspective randomized studies are ethically di cult and expensive to conduct, and results would only be obtained in decades while Markov model has been successfully used in studies of other cancers to address similar questions [9]. Markov model is considered as a powerful tool for simulating the natural history of chronic diseases. In Markov models, health states passed through by patients are de ned separately; and then through modeling on the basis of a system of transition probability among states within a cycle (usually one year), the development of diseases in population (like disease-speci c incidence and mortality) could be estimated [10]. On other way around, we could estimate transition probability by repeatedly adjusting and calibrating Markov models through comparing the modeled incidence and mortality (alone or combined) with observed data in real world condition. This paper aims to estimate the transition probabilities for EC and its precancerous lesions by establishing the Markov models based on the natural history of EC, which can provide important information for EC screening with regard to choosing reasonable screening and follow-up intervals.

Overview
The transition probabilities among health states of EC were estimated based on established Markov models for the natural history of EC. It is known that the natural history model can predicted the incidence of EC on the basis of the EC mortality, allcause mortality, the prevalence of each pathological stage of EC and transition probability. Similarly, as above incidence/mortality/prevalence was available, we could make an estimation of transition probability.

Study design
Based on the project of early diagnosis and treatment of EC which is detailed described in our previous paper [11], this study through cluster sampling selected Hejian Town as the screening site in Linzhou County, Henan Province, and the population aged 40-69 in this town as the target screening population, among which those without contraindications for endoscopy were examined by endoscopic iodine staining, and those with positive results were examined by pathology, and their pathological results, age, gender and other basic information were recorded in detail. For precancerous lesions and esophageal cancer diagnosed by screening, the treatment principals were as follows: (1) for severe dysplasia, carcinoma in situ and intramucosal carcinoma, endoscopic mucosal resection (EMR) or argon plasma coagulation (APC)was recommended; in the rst year after treatment, they should be follow up by endoscopy; (2) for submucosal carcinoma (T1N0M0), esophagectomy was recommended; (3) for invasive carcinoma, common treatment modalities were chosen depending on disease severity and could include surgery, radiotherapy, or chemotherapy, or combination. At the same time, the incidence of EC in each age group every 10 years from 40 to 69 years old in Linzhou County in 2005 and the statistical monitoring data of 321,737 deaths in 2004-2006 were obtained from Linzhou County Cancer Registry.

Natural history of EC
We developed a 8-state Markov model simulating the natural history of esophageal carcinogenesis: normal, mild dysplasia (mD), moderate dysplasia (MD), sever dysplasia/carcinoma in situ (SD/CIS), intramucosal carcinoma (IC), submucosal carcinoma (T1N0M0) (SC), invasive carcinoma (INC), and death [12,13]. The state-transition Markov model based on natural history of EC have been described in the literature and also showed in Figure 1 [14]. At the start of the model, a hypothetical cohort are distributed in these Markov states except "death" state. During each Markov cycle (1 year), a person may remain in the same health state, progress to another state or regress to lesser stages, and die from other reasons or from EC. The health state of EC in the next year relied only on the health state of this year and the corresponding transition probabilities [15]. Taking the eligible screening age in real world condition (40-69 years old) and the average expected life years (73 years old [16]) into consideration, the hypothetical cohort aged 40 would move in the Markov model and be followed up for 30years. TreeAge Pro 2009 Suite by TreeAge Software Inc, was used for all analysis.

Parameters used in the Markov model
To establish the Markov model of natural history for EC, the following parameters were needed: initial probability, transition probability, and death probability. Initial probability refers to the prevalence of each health state for cohort members at the start of the modeling [17]. Transition probability denotes the likelihood of progression or regression from one health state to another in a Markov cycle. Death probability represents the probability that population die from EC or other causes in each model cycle. The project EDETEC was launched by the Chinese Central Government in 2005, aimed at increasing the early detection and treatment rate, the ve-year survival rate of EC, and so forth [18]. Until 2008, 8267 participants aged 40-69 years have received endoscopic and nearly 3000 performed pathological examination. Most of the esophageal cancer were Squamous carcinoma while only 5% of them were Barrett's esophagus. The initial distribution probabilities of 7 Markov states except death (from "normal" to "INC") were respectively 88.95%, 8.2%, 1.8%, 0.9%, 0.08%, 0.05%, and 0.02% in the 40-44 year age group [14].

Death probability
Previous study has demonstrated that persons with SD/CIS or lesser may not die from EC; that IC or SC cases may die from all causes including EC; INC patients were assumed to mainly die from EC [14]. Therefore, in the natural-history model, the corresponding death probabilities for three kinds of population above were converted from non-esophageal-cancer mortality, allcause mortality, and case fatality rate of EC, respectively. And they all were obtained from the published data, which were counted according to Linzhou County Cancer Registry during 2004-2006, and the results of our prospective cohort study based on the EC chemoprevention trial of selenomethionine and celecoxib in "Early Detection of EC" (EDEC) program [14,19]. Table 1 demonstrates the age-speci c death probabilities of different Markov states.

transition probability
The transition probabilities among health states were estimated using the approach taken by others [20][21][22][23]. Firstly, transition probability ranges were determined from published studies [6,7,[24][25][26][27][28][29][30], cohort data in the chemoprevention trial of EDEC program mentioned previously [19], and experts' opinions serving as an initial data set. Due to the limited sample size, the transition probabilities were signi cantly different among published literatures, and even some parameters for progression and regression of the disease were unavailable. Thus, in the second step, the transition probabilities were hierarchically calibrated to make the modeled age-speci c EC incidence curves t the empirical ones observed in real world settings. In the third step, we further adjusted the transition probabilities to obtain a distribution of each pathological stage for EC similar to that of surveillance data. Observed data from Linzhou County Cancer Registry during 2004-2006 and the screening results in above EDETEC [18] project during 2005-2008 were used for the model calibration.

Model validation
Incidence and prevalence of EC was highly associated with the transition probabilities among its precancerous lesions. Internal validation of our results was performed by comparing the model predictions with observed epidemiological data from Linzhou County Cancer Registry during 2004-2006 and the screening results in above EDETEC project during 2005-2008. Validation outcomes included the age-speci c incidence of EC and the distribution of pathological stages of EC.

transition probabilities
The transition probabilities from a cancer-free state to a EC state and the age-speci c transition probabilities from preclinical cancer state progressing to next preclinical cancer state are presented in Table 2. According to the Markov model, the sum of the transition probabilities from each initial state to the other states is equal to 1. At the same time, the transition probabilities of IC and SC varies greatly with age. Therefore, for IC and SC, we obtained the transition probabilities of ve years old as an age group through consulting experts and repeated tting. All transition probabilities were within or very close to the related ranges from literatures [6,7,[24][25][26][27][28][29][30]. The progression probability increased with severity of the disease. And the related regression probability decreased. The age-speci c progression probability for SD/CIS, IC and SC cases increased with age. The estimated pathological stage distribution of EC is shown in Table 3, which closely approximated the results of the EDETEC project during 2005-2008, within the 99% con dence intervals of screening data. The proportions declined with the severity of the disease, mD stage ranked rst (over 15%), followed by MD stage and SD/CIS stage, the proportion of EC stage (including IC, SC, and INC) was the lowest (about 0.8%).
And age-speci c distributions of mD, MD, SD/CIS, and EC (including IC, SC, and INC) predicted by modeling were similar to related screening results. Figure 2 shows the proportions increased with age 65-69 age group ranked rst. This trend is consistent with the natural history of the EC and with reports from other high-risk areas in China [34,35]. Table 4 shows the comparison between the incidence of middle and advanced EC obtained by model tting and the incidence of EC in Linzhou County from 2004 to 2006. The trend of age-speci c incidence rates from 40 to 69 years estimated by the Markov model was similar to the report in Linzhou County, the incidence rates increased with age. The estimated value of the model is approximately 97-107% of the observed value. It can be seen that the incidence obtained by model tting is highly consistent with the Cancer Registry Report in Linzhou County.

Discussion
To our knowledge, our study is the rst to comprehensively present transition probabilities for natural history of EC based on the state-transition Markov model for further evaluation of EC screening projects. The transition probabilities among some health states estimated in our paper, were partly different from published data. Most previous studies on the natural history of EC were conducted by repeated endoscopic screening or follow-up observation of precancerous lesions, but these methods require a long time, and the probability between some states might not be obtained. Moreover, the natural transition probabilities for some precancerous lesions were ethically di cult to get. Although previous studies have reported some transition probabilities for parts of pathological stages of EC, the data were signi cantly different among studies mainly owing to the small sample size [6,7,[24][25][26][27][28][29][30].In contrast, Markov model has been successfully used to simulate the natural history of some malignant tumors [36,37], so we tried to use it to simulate the natural history of EC and estimate the transition probabilities between its various states.
Markov model is a useful method to describe the process of individuals passing through a series of states in continuous time. These types of patterns explain how patients transition between long-term disease states, which is popular in explaining the natural history of disease [38,39]. Moreover, the Markov model has another advantage that it can utilize data with limitation in quality and accuracy, which may be collected retrospectively [40]. In the present study, we constructed a Markov model to estimate the 1-year transition probabilities of various health states of EC (normal, mD, MD,SD/CIS,IC,SC and INC). In particular, we tted the age-speci c proportion of EC at each pathological stage and predicted the incidence rates of EC in our model, which were nally compared with the screening data of Linzhou County.
In this study, we established Markov model for natural history of EC on the basis of the transition probabilities. We set the disease (EC) to progress only one stage in a year, without considering the possibility that a few individuals could develop two stages in a year, but the proportion of such individuals in the population is very small. And we also tried to set up such a model in which the probabilities were very small, they had little effect on the outcome. Compared with the transition probabilities that can be found in the literature, the tting transition probabilities in this study are relatively consistent except the transition probability between healthy to mD and mD to MD (the sample size of the literature is small). But due to the small sample size of the early literature, the stability and generalization of the results may be limited. In addition, the age proportion of mD, MD, SD/CIS, IC, SC and INC simulated according to the transition probabilities tted by the model is close to the real proportion obtained by actual screening. And the transition probabilities of the model can still have such a good consistency after a cycle of 30 cycles. We believe that the model tting transition probabilities has high credibility and good authenticity.
Most of the data used in our model came from the Linzhou County Cancer Registries, where extensive endoscopic screening has been conducted since the 1980s and established systematic cancer incidence and death registries. Therefore, the relevant data of Linzhou County are reliable. Although the model did not use near two years data to estimate the transition probabilities of EC (because the model requires many parameters, and the data in the past two years is very limited, and many data cannot be obtained or searched), the change of EC in Linzhou County in recent 10 years was not signi cant, the transition probabilities in this model was obtained after 30 cycles of estimation, and the incidence of the disease in the middle and late stages of 40-69 years was basically close to the monitoring results of Linzhou County.
The model was internally validated by comparing the model predictions with epidemiological data from Linzhou County Cancer Registries. Model-predicted age-speci c distributions of pathological stages of EC matched well with the screening results of Linzhou county, and the trends were similar to data reported in Ci county, another high risk areas of EC in China [41]. Owing to the inclusion of younger age group, the results of Anyang trial were not comparable to our predictions [35]. The age-speci c incidence of EC estimated by modeling were quite close to Linzhou County Cancer Registry data. However, further comparison with other areas were not conducted due to the unavailability of related reported data. In general, the estimated transition probabilities in the study were reliable for high risk areas in China.
Our study also has several limitations. Firstly, the transition probabilities were estimated on the basis of limited published data and short-term cohort studies. Although model validation results con rmed a relatively high internal validity, the external validity of the model could not be assessed due to the lack of other available data sets. There is insu cient evidence to extend the results of this study to other regions. The results of the model depend on the quality of the registration and screening data in a large extent, which may have a potential risk of deviation. Therefore, long-term cohort studies with a large sample size were needed to test and improve our results. Nevertheless, this kind of epidemiological research is currently absent in China. Secondly, the transition probabilities were estimated based on the data from high risk areas in China. Differences in social demographic structure and risk factors such as life style and diet habits across China made it prudential to transfer our results to low risk areas of EC. Finally, in this study, the transition probabilities of all health states should vary with age. However, due to the small sample size of the literatures and the unavailability of data, the transition probabilities of normal state, mD state and MD state were xed, which would affect the results of the model to some extent [14]. Therefore, we should con rm our obtained transition probabilities by following up the natural history of a larger sample of people, and further improve our model of EC natural history and transition probabilities.

Conclusion
we systematically estimated the transition probabilities of EC and estimation of the natural history of EC using Markov model is scienti c and feasible. It was necessary to test and improve the transition probabilities by follow-up studies of large cohorts in the future. The transition probabilities could be used to exploring the suitable screening strategies, addressing questions such as the optimal screening frequency, screening intervals, and the optimal age to start screening. In addition, the transition probabilities could be used to establish Markov model for different screening strategies, and further to evaluate their health economic effects, which would guide policy makers in health care resource allocation. Compliance with ethical standards Con ict of interest All authors declare that there is no con ict of interest.

Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent
Informed consent was obtained from all individual participants included in the study.   [26] progression to mD 0.024 0.0454-0.0591 [6,24] [31] 65-69 years old 0.180 remaining INC 0.2304 [32,33] *: transition probabilities have been used in the health economic evaluation exploring preferable screening strategies for EC in high risk areas of China [14]. mD: mild dysplasia; MD: moderate dysplasia; SD/CIS: severe dysplasia/ carcinoma in situ; IC: intramucosal carcinoma; SC: submucosal carcinoma; INC: invasive carcinoma. "Estimated": referred to the proportions were estimated through Markov models simulation. "Screening": referred to the proportions were calculated on the basis of the "Early Detection and Early Treatment of EC in Demonstration Centers in China" project screening results during 2005-2008. "99%CI": 99% con dence interval. mD: mild dysplasia; MD: moderate dysplasia; SD/CIS: severe dysplasia/carcinoma in situ; IC/SC/INC: intramucosal carcinoma/submucosal carcinoma/invasive carcinoma.