M: Mechanism of occurrence
The Mechanism of occurrence in infectious diseases encompasses a complex interplay of factors that determine the development, transmission, and control of these diseases. Understanding the mechanism of occurrence is fundamental for accurate modeling and prediction of disease dynamics, as well as for informing effective intervention strategies. In this section, we delve into the key components that constitute the mechanism of occurrence, including disease natural history, transmission process, risk factors, and possible interventions.
M1: Disease Natural history
The natural history of a disease encompasses the entire trajectory of the disease, starting from its onset and progressing through various stages until its outcome, without any treatment or intervention1. From an individual perspective, the natural history of an infectious disease involves the complete process from susceptibility to latent infection, symptomatic or asymptomatic infection, and ultimately recovery or death. When considering the natural history, key epidemiological characteristics of the infectious disease are considered, including infectivity, pathogenicity, and virulence.
The disease process is characterized by dynamic changes in an individual's status, including susceptibility, exposure, symptomatic manifestations, and recovery, among others. Therefore, it is essential to elucidate the natural history of the disease process by tracking these status updates. The flowchart of statuses may vary depending on the specific type of infectious disease.
M2: Transmission process
Developing transmission dynamic models requires a comprehensive understanding of the disease, encompassing various aspects such as transmission patterns, incubation periods, infectious periods, and population demographics. Selecting an appropriate modeling approach relies on understanding the primary modes of transmission, such as respiratory droplets, direct contact, or vectorborne transmission through organisms like mosquitoes.
Transmission dynamic models are based on essential characteristics known as the "three links" (infectious source, transmission route, and susceptible population) and the "two factors" (natural and social factors). These models consider multiple transmission routes, including humantohuman, environmental (e.g., through food or water), and vectortohuman transmission. Additionally, the influence of natural factors, such as environmental conditions like temperature and humidity, on pathogen survival and transmission is considered.
Transmission dynamic models incorporate practical control measures to align with realworld transmission and disease control efforts. These measures encompass both pharmacological interventions, such as antiviral drugs, antibiotics, and vaccines, as well as nonpharmacological interventions like contact tracing, testing and screening, school closures, hand hygiene, social distancing, and maskwearing. Environmental disinfection, drinking water treatment, and vector control strategies are also considered.
M3: Risk factors
Risk factors play a critical role in shaping the transmission and impact of infectious diseases. By understanding and identifying these factors, we can gain insights into the vulnerability of populations, the severity of disease outcomes, and the potential for disease spread. In this section, we explore the two broad categories of risk factors: nature factors and social factors (Fig. 2).
M3.1 Nature factors
Nature factors encompass a range of environmental, geographic, and ecological elements that influence the prevalence and distribution of infectious diseases. For example, meteorological factors such as temperature, humidity, and rainfall patterns directly impact the survival and transmission of pathogens. Geographic factors, including terrain, proximity to water bodies, and elevation, can affect the distribution of disease vectors or reservoirs. Ecological factors consider the intricate interactions between pathogens, hosts, and the environment, highlighting the complex dynamics that contribute to disease emergence and persistence.
Geographical factors have a pronounced impact on disease prevalence. The distribution of diseases and their vectors is influenced by terrain, proximity to water bodies, and elevation. For example, the geographic distribution of vector organisms can vary considerably. Meteorological factors play a crucial role in the transmission dynamics of insectborne infectious diseases and zoonotic diseases. Temperature directly affects the activity and growth cycles of insect vectors. Furthermore, temperature also has a greater impact on respiratory infectious diseases, as lower temperatures during winter, combined with weakened human resistance, tend to result in higher incidences of respiratory infections like influenza.
Ecological factors, alongside meteorological factors, significantly contribute to the prevalence of infectious diseases. These factors encompass the intricate interactions between pathogens, hosts, and the environment. Disruptions to ecosystems, such as habitat fragmentation, deforestation, and changes in land use, alter the distribution and abundance of disease vectors and reservoirs, leading to increased contact between humans, wildlife, and vectors. This heightened interaction facilitates the spillover of zoonotic diseases into human populations. The ecological balance within ecosystems plays a crucial role in the amplification or suppression of infectious diseases.
M3.2 Social factors
Social factors, on the other hand, encompass various societal and behavioral aspects that influence the transmission and control of infectious diseases. These factors include socioeconomic conditions, living standards, healthcare access and infrastructure, educational levels, cultural practices, and population density.
Socioeconomic conditions and living standards significantly impact disease spread. Access to clean and hygienic living environments, free from toxins, is essential in reducing the occurrence of certain diseases.
Healthcare access and the level of public health services are critical factors in infectious disease outcomes2. Improved medical and health conditions, coupled with robust public health measures, enhance disease prediction, diagnosis, and treatment. Increased vaccine coverage and improved detection systems contribute to a decrease in the incidence of infectious diseases.
Moreover, the social system and the speed of government response have a significant impact on epidemic control. Strict enforcement of importation measures, quarantine protocols, and effective treatment strategies have proven crucial in containing the spread of infectious diseases, as exemplified during the COVID19 pandemic.
Recognizing the interplay between social factors and infectious diseases is vital for effective management and prevention. By understanding the societal context, interventions can be tailored to address specific risk factors and promote behavior change. It is essential to foster collaboration between public health interventions and environmental factors to achieve comprehensive and sustainable disease control.
Overall, a comprehensive understanding of social factors, alongside other epidemiological considerations, is crucial for designing and implementing effective strategies to mitigate the impact of infectious diseases and protect public health.
M4: Possible interventions
Aiming at the three basic components of the epidemiological process of infectious diseases, and according to the characteristics of various infectious diseases, integrated interventions are implemented to prevent the continued spread of infectious diseases by targeting the leading links of transmission (Fig. 3).
M4.1 Managing sources of infection
The key elements encompass: 1) timely reporting of infectious diseases, 2) control measures for patients, carriers, and close contacts, 3) control measures for animal sources of infection, and 4) measures for environmental contamination of infected sites.
M4.2 Interrupting transmission routes
Specific measures are employed based on the transmission process of the infectious diseases: 1) intestinal infectious diseases: effective management of the disposal of faces and other contaminants and environmental disinfection, 2) respiratory infectious diseases: air disinfection, ventilation and personal protection (wearing masks), 3) insectborne infectious diseases: insecticide and pest control, 4) infectious diseases with complex transmission routes: establishment of comprehensive protective measures to address the various transmission patterns.
M4.3 Safeguarding highly susceptible populations
The primary measures are vaccination, developing immune barrier, giving preventive medication to people at risk and taking personal protective measures.
O: Observed and collected data
Observation and data collection are essential in the infectious disease modeling process. It helps to determine the epidemiological characteristics of infectious diseases, such as the rate of virus transmission, incubation period, and mode of transmission. This information is essential for accurate modeling and prediction of disease spread. By analyzing epidemic data, we can forecast the trajectory and magnitude of future outbreaks, assess the effectiveness of control measures, and optimize disease control strategies.
O1: Samples of infected individuals
Obtaining casespecific information is essential for understanding the dynamics of infectious diseases. Onsite surveys or historical surveillance data are used to gather data on infected individuals. Stratification of infections based on different dimensions is often necessary.
O2: Demography features
In our increasingly interconnected world, demographic factors play a significant role in disease transmission. Factors such as urbanization, population aging, travel, and migration contribute to the spread of epidemics. Understanding the link between environmental factors, human health, and disease transmission is crucial. Global climate change, for example, affects the distribution of vector, food, or waterborne diseases and interacts with vulnerability factors and disease transmission dynamics. Additionally, health equity is closely tied to economic growth, healthcare resources, and the accessibility of educational resources. Gathering demographic data, such as birth rates, death rates, population numbers, and migration patterns, from reliable sources like the WHO, World Bank, or national statistical yearbooks, helps inform modeling efforts and assess disease risk.
O3: Characteristics of natural history
Natural history of each disease is complex and requires careful consideration1. Key aspects include distinguishing between infection and disease, understanding the potential for reinfection, and recognizing heterogeneity in infectiousness, latency, disease progression, and interactions with other infectious diseases. Obtaining data on the natural history characteristics of the disease involves referencing scientific literature and analyzing surveillance data.
O4: Interventions intensities
Incorporating interventions into disease models allows us to estimate the impact of improved diagnostics, new drugs, and different control measures. Data on intervention parameters, such as treatment efficacy, diagnostic accuracy, or implementation coverage, are typically obtained through a thorough review of scientific literature and relevant studies. These data help us assess the effectiveness and costeffectiveness of interventions in controlling infectious diseases.
By systematically collecting and analyzing relevant data during the observation and data collection phase, we can enhance the accuracy and validity of infectious disease models. This, in turn, allows us to generate more reliable predictions and develop effective strategies for disease control and prevention. Once the necessary data has been collected and observed, the next step is to develop a mathematical model that represents the transmission dynamics of the infectious disease.
D: Developed model
Developing a mathematical model to represent the transmission dynamics of an infectious disease is a crucial step in epidemiological research. This model serves as a powerful tool for simulating and understanding how the disease spreads within a population, enabling us to explore different scenarios, assess intervention strategies, and make predictions about future trends.
D1: Assumptions and simplification
In choosing the most appropriate model, we start with an existing qualitative understanding of the epidemiological process of the disease and then select it in relation to the disease type and the purpose of the study.
D1.1 Type of disease
Infectious diseases are diseases that arise when a pathogen infects an organism and can be transmitted from person to person, animal to animal or animal to human. There are many different types of infectious diseases, each of which can be broadly classified according to its transmission characteristics as gastrointestinal infectious disease, respiratory infectious disease, Contact, blood and sexually transmitted diseases, as well as animal and vectorborne infectious diseases. Depending on the different disease categories to which the disease under study belongs, we can choose between a purely humantohuman transmission model or a crosspopulation transmission model.
D1.2 Objectives of the study
Models can be used to express the epidemiological process of a disease in symbolic, numerical formulas that can quantitatively reveal the inner laws and used for analysis, interpretation, prediction, control or decision evaluation. Further analytical studies of different infectious diseases of various types, specifically disease prediction, estimation of transmission capacity and evaluation of the effectiveness of interventions, are carried out. For example, when simulating the effects of an intervention, the parameters and links to be evaluated for a single intervention or a combination of interventions need to be matched and the parameters further supplemented or adjusted for the evaluation of the effects of the intervention3. It is often possible to construct a transmission model with single or multiple control measures, to simulate epidemic trends with single or combined measures, and thus to assess the effectiveness of a particular control measure3 4.
D2: Choose mathematical theories to formularize
We classified mathematical models as either data driven or mechanismdriven (Table 1).
D2.1 The “Datadriven” model
The "datadriven" model contains a series of models exploring the relationship between disease occurrence and time, which is also a hot topic in the mathematical modeling of infectious diseases in China. Common methods include: time regression model, control graph method, time series model, Autoregressive Integrated Moving Average Model (ARIMA), Monte Carlo algorithm model, gray theoretical model, neural network model, etc.
D2.2 The "Mechanismdriven" model
"Mechanismdriven" model is classified by different research object types and parameters include: 1) group and deterministic models, such as transmission dynamics models; 2) individual models and random models, such as agentbased models, multiagent systems, cellular automata, etc.
Table 1
Overview of datadriven and mechanismdriven models for epidemic modeling.
Datadriven models



Time regression model


Logistic differential equation model


Chart controlling method


ARIMA


Monte Carlo algorithm model


Gray theory model


Neural network model


Others

Mechanismdriven models



Ordinary differential equations (ODE)


Stochastic individual or agentbased modeling


Others

D3: Analytical/Numerical solutions to model
Except the very simple models that can be solved analytically, almost all models are too complicated to find any analytical solution, which must be solved numerically, such as using a computer. In general, the procedure next the model formularization is to find solutions to model. The existence and uniqueness of model solution should be inspected in this step. If the solution dose not even exist, then one must go back and check the model development process. In some big projects, we may call this step as “build a computational model for the model”.
E: Examination
After developing and analyzing the mathematical model of infectious disease transmission, it is crucial to subject the model to thorough examination and evaluation. This step is essential for assessing the model's validity and accuracy, as well as identifying potential areas for improvement. By examining the model's performance, we can ensure that it aligns with empirical observations and provides meaningful insights into the dynamics of infectious diseases.
E1: Stability
Model stability refers to the degree of consistency in the output of a model when there are slight variations in the epidemic data5. In epidemiological research, models are often employed to predict disease transmission patterns, assess the effectiveness of interventions, and inform public health decisionmaking. If a model lacks stability, even minor changes in the input data can lead to significant variations in the output, thereby impacting our understanding of disease dynamics and the accuracy of intervention strategies.
E2: Estimation for model
When a model is developed in specified formulation using specific knowledge of mechanism and mathematics, the examination for model must be proceeded before it can be used for prediction, estimation or other applications. Firstly, we need to check whether the model is selfconsistent or not, i.e. it should not obvious contrary to existing theories. Assume that one derives “the basic reproduction number lesser than 2 means the disease will spread over almost the entire population”, then obviously there must be something went wrong. And to ensure the model is well organized and robust to small noise and missing data. Such examination involves stability analysis for the model equations, sensitivity analysis for parameters and the error analysis for numerical methods in solving the model numerically. After the behavior of model is tested analytically or numerically, we still need to check, if the model explains the data we already accumulated, and is it better than the existing models? In such analysis, modelers may implement parameter fitting, smoothing or filtering techniques in estimating state variables and parameters6 7.
E3: Parameter estimation and interpretability
Parameters can usually be divided into two categories: scenariospecific parameters and diseasespecific parameters. Scenariospecific parameters refer to the differences in transmission from different locations, populations, and times, which are represented by the transmission rate coefficient. Initial values of various variables need to be set after parameter estimation, such as the number of susceptible persons, the number of infectious sources, and the number of immunized populations in the study area. Diseasespecific parameters commonly used parameters of natural history.
E3.1: Estimate the transmissionspecific parameters
Transmissionspecific parameters mainly including transmission rate(β), population exposure and probability of infection for a single exposure. Such parameters can be estimated in two ways, one is through field surveys, such as exposure surveys, and the other is obtained through simulation, e.g. fitting of actual epidemic data.
In the crosssex model, β needs to be split into the transmission rates between male and male(βmm), male to female(βmf), female to female(βff), and female to male(βfm). In the model across age groups, β needs to be split into transmission rate between different age groups(βij) and transmission rate within age groups(βii). In the case of models that consider contaminants in the environment, the environmental transmission coefficient to the population(βw) also needs to be considered. In case of crosspopulation models, the transmission coefficient(βa) of the animal or vector to the population also needs to be considered.
E3.2: Estimate the diseasespecific parameters
Diseasespecific parameters usually refer to disease natural history parameters, such as incubation period(ω), latency period(ω’), disease duration(γ), infectious period, proportion of occult infections(p), proportion of severe cases(ps), and mortality(f), etc. Such parameters are relatively variable among different disease species, and differences in parameters between regions for the same disease are usually less pronounced than those between different disease species. Such parameters, when modeling, can be obtained through firsthand data in the field, or through references as they are more difficult to obtain in the field, and sensitivity analysis or uncertainty analysis should be carried out appropriately for parameters from references.
E3.3: Estimate the interventionspecific parameters
Currently, the main preventive and control measures for infectious diseases include pharmacological interventions (vaccination and medication) and nonpharmacological interventions (isolating patients, wearing masks, increasing social distance, etc.). The effectiveness of nonpharmacological interventions has now been confirmed by multiple studies, successfully controlling the prevalence of various diseases by strict implementation of various public health policies, such as isolating cases, tracing close contacts, and social distancing. The corresponding parameters are increasing the isolation coefficient (φ), increasing the social distance is reflected in the population contact degree (x), and wearing a mask is reflected in changing the probability of infection with a single contact infection rate (p). The study has evaluated the effectiveness of vaccination, mainly including the vaccination rate (δ) and the vaccine effect parameters. In terms of medication treatment, there are studies evaluated the prevention and control effect of the population, the main parameters include the shortening of disease duration (γ), the reduction of patient severe illness rate (q), and the reduction of severe case fatality rate (fc).
L: Linking model indicators and reality
The goal of developing mathematical models for infectious disease transmission is to bridge the gap between theoretical insights and practical applications. While models provide valuable insights into the dynamics of disease spread, it is crucial to establish a strong link between model indicators and realworld observations. This ensures that the model's predictions and recommendations are relevant, reliable, and actionable in the context of disease control and prevention.
L1: Indicators of disease transmissibility
Basic reproduction number, R0: it is an important indicator used to measure the transmissibility of an infectious disease. R0 is defined as the number of new cases generated by an infected individual in an otherwise fully susceptible population and in absence of interventions. The grater the R0, the greater the transmissibility of the infectious disease8.
When R0 < 1, the disease will not cause an epidemic, the number of infections will decrease and the disease will be gradually eliminated.
When R0 > 1, the disease will cause an epidemic.
Thus, R0 = 1is the threshold for the transmission of infectious diseases9.
From the definition, it is clear that the calculation of R0 requires more stringent conditions, i.e., the whole population is susceptible. Proportion of susceptible population declines gradually as the epidemic progresses or interventions are implemented, At this point it is no longer appropriate to use R0 to measure propagation capacity. The effective reproduction number, Reff, or timevarying reproduction number, Rt should be applied10.
L2: Indicators of disease burden, epidemiological features and intervention effectiveness
Total attack rate (TAR): is the percentage of the cumulative cases of a disease in the whole population during an epidemic. The formula is:
$$\text{TAR=}\frac{\text{Cumulative cases}}{\text{total population }}\text{ × 100%}$$
Total asymptomatic infection rate (AIR): is the percentage of the cumulative asymptomatic infections of a disease in the whole population during an epidemic. The formula is:
$$\text{AIR=}\frac{\text{Cumulative asymptomatic infection}}{\text{total population }}\text{ × 100%}$$
Total infection rate, TIR: is the percentage of the cumulative infections of a disease in the whole population during an epidemic. The formula is:
$$\text{TIR=}\frac{\text{Cumulative infections}}{\text{total population }}\text{ × 100%}$$
Thus, TIR = TAR + AIR。
Duration of outbreak (DO): is the time interval from the start of transmission of the infectious disease to the end of the outbreak1. There are two ways to define DO, one is the time interval from the first case onset to the last case onset, i.e., the DO is calculated based on the epidemic curve; the other is the time interval from the first case onset to the last case recovery. The formula is:
Where t1 is the date of onset of the first case and t2 is the date of onset or recovery of the last case.
Peak incidence: is the maximum incidence or number of infectious diseases in the smallest calculated time (e.g., day or week) during an epidemic.
Time of peak incidence: is the specific time (e.g., date) at which the peak incidence of a disease occurs during an epidemic.
Severity rate: The proportion of severe cases in the total cases is one of the most important indicators of virulence.
Mortality rate: indicates the proportion of deaths due to a disease among patients with that disease in a certain period of time and indicates the risk of death for patients with that disease1. The formula is:
$$\text{Mortality rate=}\frac{\text{deaths due to a disease in a certain period of time}}{\text{total number of cases }}\text{ × 100%}$$
Secondary attack rate (SAR): Also known as the secondary incidence rate, it is an important indicator used to measure the ability of a certain infectious disease to spread. It is the percentage of the number of susceptible contacts who develop a disease between the shortest and the longest incubation period for certain infectious diseases, as a percentage of the total number of all susceptible persons. The formula is:
$$\text{SAR=}\frac{\text{Number of cases among susceptible contacts during the incubation period}}{\text{Total number of susceptible contacts }}\text{ × 100%}$$
S: Substitute specified scenarios
In the realm of infectious disease modeling, the ability to substitute specified scenarios is a fundamental step in bridging theoretical insights with practical applications. By simulating and assessing specific scenarios, we can gain a comprehensive understanding of the potential outcomes of different interventions and policy measures.
S1: Simulating
Building upon the groundwork laid in the previous four steps, the next crucial phase involves running the infectious disease model using computational methods to simulate various scenarios of disease transmission. Researchers can choose to either develop their own custom model code or utilize preexisting packages like SimInf package and EpiModel package in R, or epydemic package and Eir package in Python, which are specifically designed for infectious disease modeling.
Through simulation, we can explore the dynamic behavior of the disease under different conditions and interventions. By inputting specific parameters and variables, the model generates predictions and projections, providing valuable insights into the potential outcomes of various interventions and policy measures. Simulations allow us to assess the effectiveness of different control strategies and evaluate the impact of preventive measures on disease transmission.
S2: Model evaluation
Model fitting methods usually include the least squares method (LSE), maximum likelihood estimation (MLE), root mean square estimation (RMSE), the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). For differential equation models, an algorithm that uses an adaptive step selection strategy and uses the 4th order RungeKutta method with equidistant nodes as the discretization method is a common algorithm for solving initial value problems for ordinary differential equations.
Further goodnessoffit tests are required to determine whether the differences between the model results and the actual data are statistically significant. The goodnessoffit tests used include the Chisquare test. The coefficient of determination (R2) can also be calculated and tested for statistical significance. Cox regression can be used for the analysis of vaccine effects, to determine the time of entry into the group, and the time to the endpoint. Methods such as multiple regression analysis generalized linear models are also often used to reconcile confounding factors when analyzing influences. Commonly used software includes SPSS, SAS, R, Python, Matlab, Berkeley Madonna.
If the model evaluation results are unsatisfactory, it is necessary to revisit Step 3 and reevaluate the model assumptions and construction. This iterative process ensures that the model aligns with realworld observations and produces reliable and accurate predictions. Once the model evaluation results meet the desired criteria, we can proceed in the infectious disease modeling process.
S3: Sensitivity
Parameter sensitivity refers to the degree of influence of model parameters on the model's output. In epidemiological research, sensitivity analysis of parameters is used to assess how changes in specific parameters impact the model's results. By altering model parameters, we can understand the contributions of each parameter to the outcomes, allowing us to optimize the model or provide more accurate predictions.
The "Knockout" simulation is derived from knockout technology (an experimental technique used in genetics in which a normal gene is replaced by a defective gene at an identical chromosomal locus. Thus, the normal gene is "knocked out" by the defective gene). In modeling studies, the simulation process sets a parameter to zero and estimates the contribution of a parameter by counting the number of cases reduced or the total incidence rate. For example, in the SEIARW model, exploring the contribution of environmentally mediated afferents is done by setting βw to 0 and reflecting its role by counting the number of cases reduced.
The difference between model stability and parameter sensitivity lies in their respective focuses. Model stability concerns the impact of slight variations in the input data on the model's output, whereas parameter sensitivity focuses on the influence of changes in model parameters on the output. While both concepts involve model stability and reliability, model stability primarily addresses the overall stability of the model, while parameter sensitivity examines the impact of individual parameters. In the field of epidemiology, both model stability and parameter sensitivity analyses play crucial roles in understanding and improving the accuracy of epidemiological models.