In the US, the circumstances under which COVID-19 spreads vary dramatically among different States: not only are the levels of travel restriction orders dissimilar, but other factors affecting the disease spreading such as the population, medical resources, and social/political cultures are also distinct. A quantitative assessment of the effects of the control measures taken by the govern- ment to contain COVID-19 thus needs to be carried out on a State-by-State basis. A complication is that each individual State is not a closed system: people move into and out of the State on a daily basis. This presents a tremendous challenge to modeling [4, 5], as most current data analyses and models for COVID-19 were for the setting of a closed system without considering the inbound and outbound population movements [6–27]. Another feature of the COVID-19 pandemic that most existing models did not take into account is the non-Markovian nature of the spreading dynamics, as characterized by the various time delays associated with the dynamical states. To account for the non-Markovian characteristics in the model, coupled differential equations with distinct time delays are necessary [3].

To determine time **0 **for a State in the US treated as an open system, we develop a coupled, dual-system spreading model. In particular, we treat the target system (A) as one under influences from another, much larger system (B) that represents all the other States. Because the size of B is much larger than that of A, in terms of the spreading dynamics, system B can be regarded as a closed system. From the standpoint of nonlinear physics, the influences of system B on system A can be viewed as a perturbation or background noise, while the effects of A on B can be neglected. The perturbation can be estimated based on the population of the target State and the empirical human movement data. The backbone of this unidirectionally coupled modeling framework is a non-Markovian spreading model incorporating various time delays in a closed-system setting [3].

For the five-state (SHIJR) model in the closed system setting, there are three parameters whose values are to be determined from data: (1) the infection rate β - the probability that an individual in the S state catches the virus and switches into the H state, (2) *H*(*t*0) - the hidden population at the initial time *t*0 when the available data, i.e., the number of confirmed cases, began to increase with time to enable reliable estimation of the model parameters, and (3) the fraction of undocumented

infections, denoted as η, whose value is determined by the testing and surveillance capability of the government. For COVID-19, the state transitions in the SHIJR model are non-Markovian. In the open system setting, an additional feature exists: there is time dependence due to human movements in and out of the system. For both closed and open system models, the government control measures result in an exponential decrease in the human social and movement activities. The collective effects of these measures can be described by one parameter: the exponential decay rate of the activities, denoted by λ, where a larger rate corresponds to more stringent control measures. The value of λ can be estimated based on the available epidemic data.

A detailed mathematical description of the model is presented as a Supplementary Note in Supplementary Information (SI).

Figure 5 explains our principle to infer time **0 **in terms of a number of key dates underlying COVID-19 spreading. Some days after time **0**, the first or the first few cases emerged and were officially reported. However, the number of cases is typically small, e.g., one or two, and this number can remain unchanged for a number of days. Time *t*0 is the date after which the number of cases begins to increase. The government imposes control measures on date *t*1 *> t*0. Since the government control is an integral part of our model, it is necessary to use data up to some date later for determining the model parameters when the effects of the control measures have been manifested in the data. We assume that this would require at least 12 days and thus set

*t*2 = *t*1 + 12 days.

To estimate the four model parameters in a computationally feasible way while ensuring ac- curacy, we devise the following two-step procedure. We separate the four parameters into two groups: (β, *H*(*t*0), η) and λ, where the first three are associated with the intrinsic spreading pro- cess while the last is tied to the reduction in the human movements as a result of government control measures. Because of the time delay for the effects of the measures to be manifested, the value of λ does not affect the fitting with the data in the early stage of the disease. As a result, we first use the available data in the time interval [*t*0*, t*2] to estimate (β, *H*(*t*0), η). For each parameter, we assign an initial range of its possible values. A large number of combinations of the three parameter values are then used to evolve the model from *t*0 to generate the number of confirmed cases, *J*(*t*), in the time interval [*t*0*, t*2]. Using the real data, we carry out a weighted optimization procedure to determine a set of combinations of the three parameters with minimum errors. We then choose a range for λ and uniformly distribute a number of values of λ in this range. Each λ value is combined with the already determined combinations of (β, *H*(*t*0), η) to yield an equal number of combinations (β, *H*(*t*0), η, λ). With all the chosen λ values, this leads to a large number of combinations of the four parameters. Finally, we carry out the same optimization procedure in the time interval [*t*0*, t*3] with *t*3 *> t*2 to determine the combination of the four parameters with the minimum error. We choose *t*3 = *t*2 + 12 days.

With the values of the optimal parameters so estimated, the model generates the time series *J*(*t*) that can be compared with the data. In the time interval [*t*0*, t*3], the agreement is generally excellent. However, this is not indicative of the predictive power of the model because the model parameters are estimated using the data in the same time interval. To test the model for prediction, we run the model in the time interval [*t*3*, t*4], as indicated in Fig. 5. Comparison with the real data in this time interval reveals generally quite good agreement, validating the model.

The model is now ready to be used for inferring time **0**. Quite straightforwardly, for a given system (a State, a city, or a country), we set *H*(**0**) = 1, choose a number of possible candidates for time **0**, and run the model from time **0 **to *t*0 to test which candidate date leads to the known number *H*(*t*0): the starting date that gives the correct *H*(*t*0) value is taken to be time **0**.