Novel Uses of Three-Parameter Logistic Models and First-Derivative Models for the Coronavirus Disease (COVID-19) Epidemic in the United States, in Three Distinct Scenarios

Background: Forecasting the current coronavirus disease (COVID-19) epidemic in the United States necessitates novel mathematical models for accurate predictions. This paper examines novel uses of three-parameter logistic models and rst-derivative models through three distinct scenarios that have not been examined in the literature as of July 14, 2020. Using publicly available data, statistical software was used to conduct a non-linear least-squares estimate to generate a three-parameter logistic model, with a subsequently generated rst-derivative model. In the rst scenario a logistic model was used to examine the natural log of COVID-19 cases as the dependent variable (versus day number), on July 11 and May 1. Independent t-test analyses were used to test comparative coecient differences across models. In the second scenario, a rst-derivative model was derived from a base three-parameter logistic model for April 27, examining time to peak mortality and decrease in case fatality rate. In the third scenario, a rst-derivative model of mortality through July 11 as the dependent variable, versus conrmed cases, was generated to look at case fatality rate relative to increasing cases. model to predict time to peak mortality and predict trends in the case fatality rate (CFR). The third scenario examine the combined use of the three-parameter logistic model and the rst-derivative model to study daily change in mortality (CFR) relative to case


Introduction:
In late 2019, a new coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was identi ed in Wuhan, China, and quickly spread to many countries around the world [1]. The virus caused a disease that is now abbreviated as "COVID-19" (coronavirus disease 2019) [2]. The rst reported case of COVID-19 in the United States was on January 20, 2020 in the state of Washington, in a patient who had returned from Wuhan, China ve days prior [3]. The reporting was con rmed on January 22, 2020 [1].
Since that time the disease has remained an epidemic in the United States. As of July 14, 2020 the Centers for Disease Control and Prevention (CDC) reported 3,296,599 total cases of COVID-19 in the U.S., and 134,884 total deaths [4].
Several mathematical models have been proposed to forecast the course of COVID-19 in countries around the world, with a few studies discussing the use of the logistic model in countries like China [5,6] and Saudi Arabia [7]. This type of mathematical model has roots in the classic susceptible-infectious-removed (SIR) epidemiological model [8]. It has also classically been used as an ecological model for population growth to illustrate a population's carrying capacity in limiting the reproductive growth of an organism [9]. An example of this application is in the study of bacterial growth [10]. A literature search yields only one paper published on the use of this model for COVID-19 in the United States as of July 14, 2020, examining the use of a ve-parameter logistic growth model to forecast the epidemic's growth [6].
This study aims to examine unique and novel uses of a three-parameter logistic model generated via a non-linear least-squares analysis, with a subsequently generated rst-derivative model, in three distinct scenarios. Application of these models through the use of similar scenarios has not been explored in the literature as of July 14, 2020.
The rst scenario is to forecast peak cases, determine growth rate, and predict time to growth deceleration by using the natural log of cases as the dependent variable instead of observed case numbers. The second scenario will examine the use of the rst-derivative model to predict time to peak mortality and predict trends in the case fatality rate (CFR). The third scenario will examine the combined use of the three-parameter logistic model and the rst-derivative model to study daily change in mortality (CFR) relative to case increases.

Methods:
Using Stata/IC 16.0 statistical analysis software, non-linear least-squares estimation was carried out to generate a three-parameter logistic function model for publicly available U.S. COVID-19 case and mortality data from worldometers.info 1 [13,14]. The model form is: This paper will examine the use of this model for U.S. COVID-19 data, in three distinct scenarios.
In the rst scenario, a three-parameter logistic model was formed using the natural log of U.S. COVID-19 total con rmed cases through July 11 as the dependent variable y, with the day number as the independent variable x. January 1, 2020 was denoted as day number 1. The model was represented in the form of: The b 1 coe cient equaled the natural log of peak cases, the b 2 coe cient equaled the steepest growth rate of the natural log of cases, and the b 3 coe cient was the in ection point, which was a midway point calculated as a day number, where the slope (growth rate) of the sigmoid-curve (S-curve) starts decelerating. Independent unpaired t-test analysis was used to compare coe cients pairs from several models, to test for statistical signi cance.
In a second scenario, a rst-derivative model was generated following logistic modeling of observed U.S. COVID-19 mortality through April 27, 2020 as the dependent variable y, with day number as the independent variable x. The rst-derivative model was in the form of: The observed derivative values of the change in mortality versus change in day number were then plotted along with the generated rst-derivative model.
In the third and nal scenario, a logistic model was generated for U.S. COVID-19 mortality through July 11, 2020 as the dependent variable y, with U.S. COVID-19 total con rmed cases through July 11, 2020 as the independent variable x. A subsequently generated rst-derivative model was then plotted along with the observed derivative values of these two variables, to show trends in CFR versus increasing number of cases. The methods employed in this third scenario were useful to track CFR changes in U.S. COVID-19 mortality relative to increases in U.S. cases following rapid acceleration after mid-June. Results:

Scenario 1: Forecasting Peak Cases, Growth Acceleration, and Time to Growth Deceleration
In the rst scenario, the generated three-parameter logistic model is shown in Fig. 1 (Fig. 1). It was generated for data through July 11, 2020, starting with the data point from day 63, corresponding to March 3, when the data showed a visually consistent pattern. The coe cient of determination (R 2 ) value for this model was also 0.9997, higher than 0.9989 for the model generated from data starting on day 1. Therefore, all model comparisons in this speci c scenario will start from day 63.
The three-parameter logistic model in Fig. 1 was statistically signi cant with a p-value of 0.0000 and R 2 of 0.9997. Therefore, greater than 99.9% of the variability in the natural log of cases can be accounted for by changes in the day number, per this model, and these changes are highly unlikely to be due to chance alone. All generated coe cients (b 1 through b 3 ) were also statistically signi cant at p-values of 0.000. Table 1 summarizes the statistics for all model coe cients, including their 95% con dence intervals (95% CI).  Table 1, the b 1 coe cient for the above model was 14.5026 and the b 3 coe cient was 71.08833. Compared to much earlier in the pandemic, the most recent model at the time of this writing therefore shows variability in peak case coe cient (b 1 ) of 0.25441, which translates to a difference of 446,539 cases when e b 1 is calculated for each model. The variation in the in ection point (b 3 ) was much more telling, exhibiting less than 1-day variability (0.21921 days) compared to 93 days prior.
Another application of this three-parameter logistic model was employed earlier in the epidemic, when U.S. cases started to atten due to continued social distancing and proper precautions. Figure 2 shows the generated three-parameter logistic model for U.S. COVID-19 cases through May 1, 2020, with the dashed line denoting the model for three weeks prior to May 1. Although the models are visually very similar, statistical analysis comparing the b 1 and b 3 coe cients from both models showed statistically signi cant differences between each respective pair. This showed that implemented public health measures at that time of the epidemic yielded statistically detectable differences in total peak cases and time to growth deceleration, three weeks apart, even when not visually apparent. Table 2 shows the coe cient statistics for both models. Table 3 shows the statistically signi cant results of independent t-test analyses comparing b 1 and b 3 coe cients from each model. The analyses assumed a null hypothesis H 0 stating there is no difference between each pair of comparative coe cients, and an alternative hypothesis H a stating that there is a difference, at a desired α-value of 0.05. Alternatively, when a three-parameter logistic model was generated for case data two weeks prior to May 1, t-test analyses between the respective coe cients did not show statistical signi cance. For example, independent t-test analysis comparing the b 2 coe cients, representing the steepest case growth rates, yielded a p-value of 0.1114. Therefore, the results were not statistically signi cant, and H 0 would not be rejected in this case, concluding that there is no difference between both coe cients and that growth rate is stabilizing between models.

Scenario 2: Forecasting Time to Peak Mortality and Case Fatality Rate Trends
Data for U.S. COVID-19 mortality from April 27 was used as the dependent variable y, with day number as the independent variable x. A three-parameter logistic model was then generated using a non-linear least-squares estimation method. A subsequently generated rst-derivative model was then calculated. In Fig. 3, the rst-derivative function is plotted with the observed derivative values for the two variables. This method's main utility was in tracking changes in mortality, or CFR, with progressing days. The overlying rst-derivative model predicted time to peak mortality, and also showed the trending CFR. When it was generated on April 27, 2020, the model showed that peak mortality occurred between days 101 to 102 (April 12 to April 13) and that the CFR was trending < 1,000 deaths/day at the time the model was generated, with a forecast for a continued downward trend (which has been true, despite increasing cases as of July 11).

Scenario 3: Forecasting Case Fatality Rate Changes Relative to Accelerated Increase in Cases
To examine changes in mortality (CFR) relative to the accelerating U.S. COVID-19 case increases since mid-June to date, a combination of a three-parameter logistic model with a subsequent rst-derivative model was used. First, a non-linear least squares regression was carried out on U.S. COVID-19 mortality through July 11, 2020 as the dependent variable y, with U.S. COVID-19 total con rmed cases through the same date as the independent variable x, and a threeparameter logistic model was then generated (Fig. 4). A model with a perfect R 2 of 1.0000 was achieved when the model was tted starting from data on day 109, April 20, 2020, when the U.S. was slightly under 1 million COVID-19 cases (~ 800,000).
The generated three-parameter logistic model was statistically signi cant at a p-value of 0.0000 with a perfect R 2 of 1.0000, wherein 100% of the variance in mortality is predictable from the cases. Table 4 shows the model coe cient statistics. A subsequently generated rst-derivative model was then calculated for these two variables, to track daily change in mortality (CFR) relative to daily changes in U.S. COVID-19 cases through July 11, 2020. Figure 5 2 shows the generated rst-derivative model plotted against the observed derivative data for the two variables.
As the observed data shows, changes in mortality relatively attened once the U.S. was past 3 million con rmed COVID-19 cases (on July 5, 2020). The model forecasts an effective CFR versus change in cases of near-zero starting around 4 million con rmed U.S. COVID-19 cases, if current conditions remain constant as of July 11. Furthermore, uctuations in mortality since June 29 have not affected the t of the generated three-parameter logistic model (which remained with a perfect R 2 of 1.0000 when generated daily since then), or the subsequent rst-derivative model.

Discussion:
In the rst scenario, the use of the natural log of U.S. COVID-19 total con rmed cases as the dependent variable, and not the number of observed cases directly, was a novel approach in COVID-19 modeling. This approach was useful in determining the accuracy of the b 1 and b 3 coe cients: the peak of the natural log of cases, and the in ection point midway on the S-curve correlating to the day number denoting deceleration in the natural log of cases, respectively. The small variability in both coe cients compared to models from earlier in the epidemic, especially for the in ection point, which was less than 1 day, has pragmatic importance. In a real-world application, this use of a three-parameter logistic model shows highly reliable predictability in the day number representing the point of deceleration for case growth, whether earlier the epidemic or at later points. Estimation of peak cases (b 1 ) is discussed as a possible limitation later in this paper.
The utility of the independent t-test analyses carried out in the rst scenario also have practical applications. For example, statistical signi cance between comparative coe cients from different models indicate that there are improvements in different parameters of the epidemic. On the other hand, lack of statistical signi cance in the respective coe cients signify that case numbers are stabilizing to a point so as not to show statistically discernible differences in these parameters: peak cases, growth rate, or time to deceleration. Both results signify improvements in the epidemic that are not immediately obvious from the visual models or statistical analyses alone. Therefore, this is one measure to assess the effectiveness of epidemiologic public health measures when they are being adhered to. This also has implications as a tool for public health leaders, to help in their communication with the general public and encourage adherence to proposed public health safety measures.
In the second scenario, a rst-derivative model generated from a base three-parameter logistic model was useful as another method to objectively gauge control of the epidemic by forecasting CFR trends, speci cally, during times of already-instituted public health measures. This approach would have been useful during a time when cases of the COVID-19 epidemic were beginning to atten in the United States, such as at the time this scenario's model was generated (April 27). At such time, the general public would be concerned with time until ease of restrictions. This model would have served to give encouraging information that peak mortality had passed and would have had utility in predicting a good future date to begin policy implementations for easing certain restrictions, based on an effective CFR of near-zero. Given the individualistic culture of the United States, such information can serve to help the general public understand that adherence to certain public health safety measures is necessary for a short while longer.
The third and nal scenario has high-yield utility at a time when the COVID-19 epidemic in the United States is experiencing accelerated growth, after initially plateauing for a brief interval. With a growing number of Americans not choosing to wear masks, a topic of public debate [11], a model such as this could serve as a tool to guide public health leadership in making individualized epidemiologic recommendations taking into account the non-uniformity of American culture. The rst-derivative model in this scenario proved to be a novel approach to examining actual daily changes in mortality versus daily changes in case rates, a marker of the COVID-19 epidemic's impact with continued case increases.
With these models, decreasing CFR relative to increasing total cases of COVID-19 may indicate that aggressive treatments are being utilized readily in hospitalized patients, that those contracting the virus are not dying from it, or that the most vulnerable are being better-protected. For example, it is known that older adults and individuals with certain chronic medical conditions are most susceptible to severe illness or death from COVID-19 [12]. If CFR continues a downwards trend, that may indicate that more young individuals with mild or no symptoms are contracting the virus and not dying from it. On the other hand, if according to this model CFR starts to increase relative to accelerated case increases, then that may indicate that the young and healthy are starting to transmit the virus back to older individuals like parents, relatives, neighbors, etc. This type of conclusion is not only a critical objective measure of the current course of the virus in the population but can also serve to guide public health safety recommendations that are focused, speci c, and individualized to help control the epidemic. Such recommendations may target the most vulnerable individuals at risk of death from COVID-19 and would also better take into account the current political and cultural climate in the United States.

Limitations
The accuracy of three-parameter logistic models in predicting peak cases depends on several factors being constant in the United States, like constant rates of testing, constant rates of infection, constant rates of recovery, and a predictable and constant changing rate of deaths. Any easing of restrictions, enacting new policies or public safety measures, or improvements in testing and/or treatments will affect all these constants, and consequently affect the growth rate and predicted peak cases. This is especially applicable to models that use the actual observed case numbers as the dependent variable, where signi cant differences in growth rate and peak cases are seen depending on when the model was calculated at different points in the epidemic. A similar limitation is discussed in other papers. In a study using the three-parameter logistic model to forecast COVID-19 cases in China, the authors noted that the model failed to estimate case numbers in the early stages of the epidemic, and was only accurate when there was an apparent maximum being reached in cases [5]. In another study using a ve-parameter logistic model to estimate COVID-19 cases in the United States, data was used from March 21 to April 4, estimating a peak of about 800,000 cases [6]. As is evident now by the present number of cases, the model did not accurately predict present case numbers. The authors note that the model's long-term predictability for future new cases may not be accurate, and that it was limited to the data collected over the short interval in the study [6]. The study conducted in this paper found similar limitations as discussed in these two studies [5,6] in the use of the three-parameter logistic models to model COVID-19 data.
Of note, in the three-parameter logistic model study for China [5] the authors tested for heteroskedasticity in the data, and examined residuals, two methods which were not part of this paper. Also, unlike a different study discussing use of the logistic model study for cases in China [3], this paper did not examine applicability to other countries.
In the rst scenario, the estimated b 1 coe cient, the peak of the natural log of U.S. COVID-19 cases, is calculated at 14.5026. This gives an estimate of 1,987,921 peak cases, when e b 1 is calculated. When the same model was run with the observed case data as the dependent variable the peak cases predicted were 3,414,134, which is a closer estimate to the present number of U.S. COVID-19 cases as noted in the Introduction. The best utility therefore of the suggested three-parameter logistic model in the rst scenario, using the natural log of cases as the dependent variable, would be to estimate the in ection point coe cient b 3 since it showed minimal variability. This is when the growth of the natural log of cases would start to decelerate and is effectively the same point of deceleration for the exponential growth of the actual observed case data.
The rst-derivative models have few limitations, chie y because they are useful in tracking changes in actual rates, whether in cases, or in mortality, as discussed in the second and third scenarios. Therefore, the peaks of rst-derivative models will correlate to the in ection point of their base three-parameter logistic model, the point when rate deceleration begins. Their utility in estimating "zero rate," however, is limited by the same parameters concerning peak cases in logistic models.

Conclusion:
For the U.S. COVID-19 epidemic, three-parameter logistic models have signi cant utility in predicting changes in growth rate of data, time to growth deceleration, with limitations in predicting peak cases. First-derivative models have similar utility in tracking changes in mortality (CFR) especially relative to increase in active cases. The models also have applicable public health safety and policy implications. Finally, the ultimate utility of these models is that they are repeatable, e cient, and replicable with the ability to compare the statistical signi cance of coe cients generated from different models at different points in the epidemic.