Estimating the under-ascertainment number of COVID-19 cases in Kano, Nigeria in the fourth week of April 2020: a modelling analysis of the early outbreak

Background: The coronavirus disease 2019 (known as COVID-19) pandemic caused by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) appeared in Wuhan, China has rapidly spread to over 200 countries and territories. In Nigeria, the Kano State Ministry of Health has con�rmed its �rst case of COVID-19 on April 11, 2020, and since then there might have been issues of under-ascertainment that occurred roughly from 22 to 27 April 2020. As of 4 October 2020, there were 1738 reported COVID-19 cases in Kano with 54 associated deaths. In this work, we estimate the number of under-ascertainment cases and the basic reproduction number, B, of COVID-19 in Kano, Nigeria. We also predict the number of COVID-19 cases in the short term. Methods: We employ the exponential growth and modelled the outbreak curve of COVID-19 cases, in Kano, Nigeria from 11 to 30 April 2020. We estimated the number of under-ascertainment cases using the maximum likelihood estimation. We adopted the SI estimated for Hong Kong as approximations of the unknown SI for COVID-19 in Kano to estimate the a. We use ARIMA model to provide a short term (15 days) prediction of the COVID-19 cases in Kano, Nigeria. Results: We revealed that the initial growth phase mimic an exponential growth pattern. We found that the under-ascertainment was likely to have resulted in 213 (95% CI: 106−346) unreported cases from 22 to 27 April 2020. The reporting rate after 27 April 2020 increase up to 10-fold compared to the scenario from 22 to 27 April 2020 on average. We estimated the c of COVID-19 in Kano as 2.74 (95% CI: 2.53−2.96). We forecasted that the total number of COVID-19 cases in Kano to be 1067 (95% CI: 883, 2137) by June 6, 2020. Conclusion: The under-ascertainment likely exists during the fourth week of April, 2020 and should be regarded in the future analysis/investigation.


Introduction
Coronaviruses are a group of related Ribonucleic acid (RNA) viruses that belongs to the Coronaviridae family (and the order Nidovirales) and widely disseminated in human beings (Huang et al., 2020).Most of the coronavirus infections in human have mild symptoms.The outbreaks of the two other coronaviruses (known as beta-coronaviruses), which include severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV), have caused more than 10,000 cumulative cases in the past two decades, with the death rates of about 10% for SARS-CoV and 37% for MERS-CoV (Huang et al., 2020).The novel coronavirus disease 2019 (named COVID- 19) started in Wuhan, Hubei province of China and spread worldwide, have been declared a Public Health Emergency of International Concern by the WHO on January 30, 2020 (WHO, 2020a), and later named the 1st pandemic caused by coronaviruses on March 11, 2020(WHO, 2020b).As of June 3, 2020, COVID-19 affected over 6.5 million people with more than 386000 fatalities worldwide (WHO, 2020c).
Loading [MathJax]/jax/output/CommonHTML/jax.jsNigeria has reported its rst (imported) case of COVID-19 on February 27, 2020 following the case de nition by the NCDC (NCDC, 2020; Ohia, 2020).After which community transmission takes place due to inadequate contacts tracing from index case and lack of early closure of boarders to prevent further spread.The fact that the control of COVID-19 pandemic rely heavily on a country's health care system.
Nigeria is currently witnessing a rapid increase of COVID-19 cases probably due to the poor health care system, making it more vulnerable to the virus especially with population of over 200 million people (the highest in Africa).In northern region, Kano state has con rmed its rst case of COVID-19 on April 11, 2020 (NCDC, 2020), and since then there might have been issues of under-ascertainment that exists roughly from 22 to 27 April 2020.This is likely due to the lack of su cient health care facilities (such as test kits, gowns, and facemasks), limited diagnostic testing of suspected patients, and some other unknown reasons.Being the commercial center and the most populace state in Nigeria (with more than 10 million people), Kano is likely the most vulnerable for COVID-19 in northern Nigeria (Gilbert, 2020).As of June 3, 2020, there have been 970 cases of COVID-19 infections con rmed in Kano, Nigeria including 45 deaths (NCDC, 2020).Further the fact that few COVID-19 diagnostic testing was done in Nigeria.In particular, As of May 26, 2020 only about 69,801 people were tested throughout the country (NCDC, 2020).This together with the fact that autopsies and testing of deceased individuals are not generally carried out in the country (in some cases, for traditional or religious reasons), clearly suggests a gross under-ascertainment of the true scenario of the pandemic in the country.
Recently, there were numerous researches focusing on mathematical and statistical modelling to study the dynamics transmission of COVID-19 pandemic since its emergence in Wuhan, China (Li,  In this study, we aim to investigate the epidemiological patterns, and estimate the number of underascertainment cases and R 0 of the COVID-19 outbreak in Kano, Nigeria.We hope our results in this study will be useful to inform the world community of the under-ascertainment issues and the value of R 0 in order to help to curtail the spread of the virus.In addition, our study will make a short-term forecast of COVID-19 cases in order to predict possible scenario and informed decision makers in the country about the importance of sustaining stringent measures as recommended by the WHO and other health related organizations as the virus is yet to have effective treatment or vaccination.All measures are currently directed primarily to non-pharmaceutical interventions (NPIs), like social (physical) distancing, community lockdown, quarantine of suspected cases, contact tracing, isolation of con rmed cases and the use of facemasks in general public.

Data And Methods
According to the NCDC report, the total cumulative number of COVID-19 cases in Kano stand at 73 between 22 to 24 April 2020, and stand at 77 between 25 to 27 April 2020 (after addition of four new cases), i.e., no new case was reported between 22 to 24 April 2020, as well as between 25 to 27 April 2020, which appears weird considering the rapid increase of the outbreak curve since the index case on 11 April 2020 in Kano (NCDC, 2020).We presume that the COVID-19 cases in Kano were underascertained probably from 22 to 27 April 2020.In this paper, we estimate the number of underascertainment cases and of COVID-19 in Kano, Nigeria from 22 to 27 April 2020 based on the available data during the early phase of the epidemic.
We used the time series data for cumulative con rmation compiled by the NCDC (2020) from 11 April to 30 April 2020.All cases data were con rmed from the laboratory according to the de nition of COVID-19 cases by the NCDC which is available at https://covid19.ncdc.gov.ng/report/.The data chosen for this study was from 11 to 30 April 2020 instead of including up to the present date, this is due to the fact that the diagnostic testing have improved signi cantly since towards the end of April 2020, and also su cient personal protective equipment's (PPEs) were provided to the frontline health workers.
We suspected that there was a number of under-ascertainment of COVID-19 cases, denoted by, likely from 22 to 27 April 2020.The cumulative con rmation of the total number of cases, represented by C i , of the ith day since 11 April 2020 is the summation of the cumulative cases reported/ascertained, represented by , and cumulative number of under-ascertainment cases, represented by .Thus, following the previous studies (Trotter et al., 2005; Zhao et al., 2020b), the relation/formula for computing the expected number of under-ascertainment of cases is given as C i = + , where is observed from the data, and is 0 for i before 22 April and for i after 27 April 2020.We employed the approach used in previous works (Zhao et al., 2020a;Zhao et al., 2020b;De Silva, 2009) and we modelled the outbreak curve.The C i series is used as an exponential growing Poisson process.The data from 22 to 27 April 2020 seems constant probably due to the poor testing facilities and some other unknown reasons, thus, these data were ignored in exponential growth tting.The and the intrinsic growth rate (represented by γ) of the exponential growth were to be estimated using the log-likelihood estimation (ℓ), from the Poisson distributed likelihood framework on number of cases.We estimated the 95% con dence interval (95% CI) of based on the pro le likelihood estimation technique with cutoff threshold computed by a Chi-square quantile, given by χ 2 pr = 0.95, df = 1 (Fan & Huang, 2005).We obtained the R 0 based on the estimation of γ following similar approach as in (Zhao et al., 2020a;Zhao et al., 2020b;Musa et al., 2020).Therefore, using similar approach (Zhao et al., 2020b), we have the basic reproduction number, R 0 , given by with 100% susceptibility presumed at the early stage for COVID-2019 outbreak, where denotes the serial interval (SI) of COVID-19 follows a probability density function h().We note that this formula has been derived theoretically as well as adopted in previous studies (Zhao et  Since the transmission chain of COVID-19 in Africa still remains fully uncovered, we adopted the SI information of COVID-19 from previous works, see for instance (Du et al., 2020;Nishura et al., 2020).We note that were modelled as a lognormal distribution with mean of 5.0 days and standard deviation (SD) of 1.9 days (Du et al., 2020;Nishura et al., 2020).It is important to note that slightly changing the SI information may not affect our main results and conclusion.In this work, we also aimed to evaluate the trends of the daily number of cases, in this case, represented by for the i-th day, and given as C i = C i−1 + .
A simulations algorithm based on previous study (Zhao et al., 2020b) was formulated for the iterative Poisson distribution given by , here the function represent the expectation.For details of the simulations framework see (Zhao et al., 2020b).Furthermore, we employed Autoregressive Integrated Moving Average (ARIMA) model (a time series model), which has been used in previous study to make short-term prediction (Maleki, 2020).The model consists of three key parameters: p (autoregressive order), d (the degree of difference) and q (moving average order).We assigned these parameters different values to form candidate models.To t models with given values of p, d, and q, conditional-sum-of-squares method is used to nd starting values of other parameters, then maximum likelihood method is applied passing the starting parameter values to calculate the nal estimates (Shephard, 1997).As Akaike's Information Criterion (AIC) (Akaike, 1998)  When making predictions, ARIMA model assumes the residuals normally distributed, which could make some predictions go below zero.Here, according to the actual situation, we only kept the non-negative predicted value.
The R statistical software (version 3.5.1)was used for the simulations in this study.

Results
We estimated the total number of COVID-19 under-ascertainment cases as 213 (95% CI: 106−346) likely from 22 to 27 April 2020, see Figure 1a.Clearly, this result was notably greater than zero.We also estimated the R 0 as 2.74 (95% CI: 2.53−2.96),see Figure 1b, which is largely consistent with previous ndings (Zhao et al., 2020a; Zhao et al., 2020b; Musa et al., 2020).In Figure 2a, with the value of R 0 as 2.74 and as 213, the exponential growing framework tted the cumulative number of COVID-19 cases (C i ) eloquently well, considering the McFadden's pseudo-R-squared value of 0.99. Figure 2b showed the tting results using the exponential growth of the daily number of COVID-19 cases in Kano, Nigeria.
The estimation of rely hugely on the SI of COVID-19.In this study, we adopted the SI information of the COVID-19 from previous works (Du et al., 2020;Nishura et al., 2020) as approximations to that of Kano, considering the fact that the estimation of SI requires su cient time.This is because the computation of SI needs the knowledge of disease transmission chain which requires adequate number of patient samples and time for follow-up (Cowling, 2009), and thus this is di cult to be done in a short period.
Loading [MathJax]/jax/output/CommonHTML/jax.jsHowever, using the SI of Hong Kong as approximation could provide a reasonable insight into the transmission potential and features of COVID-19 in Kano at the early phase of the outbreak.We found that changing the mean and SD of SI very slightly would not change our main results.The R 0 of COVID-19 in Kano was estimated at 2.74 (95% CI: 2.53−2.96),and this is consistent with previously computed R 0 (Zhao et al., 2020a;Zhao et al., 2020b).
In Figure 3, we depicted 15-days forecast of cumulative covid-19 cases (red line) and observed data (blue dots).The 95% con dence interval is marked in red.We found the values of indicators that evaluated the performance of the model used given by: Mean Error (ME) as -4.66;Root Mean Squared Error (RMSE) as 7.18; Mean Absolute Error (MAE) as 6.07; and Mean Absolute Scaled Error (MASE) as 0.33.Furthermore, we presented the auto-correlation function (ACF) and partial autocorrelation function (PACF) plots of the reported COVID-19 cases in Kano, see Appendix Figure 1.We presented the Q-Q plot of residual that show the residuals were normally distributed and Ljung-Box test provided the p value as 0.54, see Appendix Figure 2.

Discussion
The total number of COVID-19 under-ascertainment cases estimated as 213 (95% CI: 106−346), see Figure 1a.This result insinuated the existence of under-ascertainment cases likely from 22 to 27 April 2020.After considering the effect of under-ascertainment, our estimation of the R 0 as 2.74 (95% CI: We provided the simulated daily number of COVID-19 cases () as shown In Figure 2b.We found that, the parameter is approximately equaled the observed daily number of cases after 27 April 2020, but was larger than the observations of cases from 22 to 27 April 2020.Our nding highlight that the underascertainment probably exists in the fourth week of April 2020.Thus, the reporting rate was estimated after 27 April 2020, which was found to have increased by up to 10-fold (95% CI: 5-16) compared to the scenario before 27 April 2020 on average.One possible reasons was due to the poor testing facilities for diagnosing daily new cases, and lack of adequate PPEs for the frontline health workers.The newly reported daily cases started increasing very fast after 27 April 2020, see Figure 2b.
Although under-ascertainment is di cult to be ameliorate for some diseases (Sethi et al., 1999), e.g., infectious intestinal disease (IID), this is unlikely for COVID-19 (Sethi et al., 1999).In general, solving the problem of under-ascertainment, cases and controls need to be appropriately included in cases-control studies.In addition, knowing the exact number of ascertainment cases or maintaining a low risk underascertainment cases is very crucial in controlling disease pandemic (https://catalogofbias.org/biases/ascertainment-bias/).Considering the general asymptomatic (mild) nature of the COVID-19 infection, it is possible that different reporting controls have different criteria for nding the kind of strategies to be used in order to avoid or reduce under-ascertainment issues.We assert that our estimates should be considered for future analysis /investigation.Furthermore, estimation of key Loading [MathJax]/jax/output/CommonHTML/jax.js epidemiological parameters in populations during disease epidemics by using routine data requires knowledge of when, where and to what extent these data represent the true scenario of disease, and in some instances it is necessary to make some adjustment in order to avoid underestimation.Multiplication factors can also be used to adjust noti cation and surveillance data to provide more realistic estimates of incidence (Gibbons et al., 2014).
We conducted the rst-order difference and Augmented Dickey-Fuller Test to the reported COVID-19 cases in Kano, Nigeria.Processed data was proved to be the stationary time series (p value 0.017).According to the lowest AIC, the best model among all models was ARIMA (2,1,0) (Table 1), which is consistent with the ACF and PACF result (Appendix Figure 1) of the Processed data.The Q-Q plot of residual (Appendix Figure 2) revealed that the residuals were normally distributed and Ljung-Box test provided the p-value as 0.54, suggesting that the residuals are likely to be white noises.Furthermore, we forecasted that the total number of COVID-19 cases will be 1067 (95% CI: 883, 2137) by June 6, 2020.Figure 3 indicates that the tted models seems very appropriate.We also found the values of indicators that computed the performance of the model used, which are given by: Mean Error (ME) as -4.66;Root Mean Squared Error (RMSE) as 7.18; Mean Absolute Error (MAE) as 6.07; and Mean Absolute Scaled Error (MASE) as 0.33, showing the accuracy measures for the forecast model, this further highlight the good performance of the estimated models of the total con rmed cases. The 2020; Zhao et al., 2020a; Zhao et al., 2020b; Ngonghala et al., 2020; Eikenberry et al., 2020; Tang et al., 2020; Wu et al., 2020; Musa et al., 2020; Gilbert et al., 2020; Lin et al., 2020).Some of these studies focused on estimation of basic reproduction number by using the serial intervals and intrinsic growth rate (Zhao et al.,2020a; Zhao et al., 2020b) or using ordinary differential equations and Markov Chain Monte Carlo methods (Ngonghala et al., 2020; Eikenberry et al., 2020; Tang et al., 2020; Lin, 2020).However, few studies have been done to understand the transmission of COVID-19 pandemic in Africa (Gilbert et al., 2020; Musa et al., 2020).
is widely used in selecting the best model among alternatives (Wang et al., 2018; Ömer, 2010; Mondal et al., 2014), we calculated AIC value for models and chose the model with the lowest AIC.The 95% con dence intervals are obtained based on the assumption that residuals of the model are normally distributed.
estimates of the number of under-ascertainment cases between 22 and 27 April 2020, and the basic reproduction number (R_0).Panel (a) shows the likelihood pro le (ℓ, dark black curve) of the estimated number of unreported cases (η), and the cutoff threshold (horizontal blue dashed line) for the 95% CI.The relationship between η and R0, where the bold blue curve is the mean estimation, and the dashed blue curves are the 95% CI of estimated R_0.In panels (a) and (b), the purple shading area on the horizontal axis represents the 95% CI, and the vertical red line represents the maximum likelihood estimate (MLE) of the number of under-ascertainment cases.Loading [MathJax]/jax/output/CommonHTML/jax.js

Figure 2 Time
Figure 2
The pandemic of COVID-19 is still causing serious damage to global public health and socioeconomic developments.With the absence of a vaccine and effective treatment for use against this deadly and Loading [MathJax]/jax/output/CommonHTML/jax.jsConceptualization: