An epidemiological modeling framework to inform institutional-level response to infectious disease outbreaks: A Covid-19 case study

Institutions have an enhanced ability to implement tailored mitigation measures during infectious disease outbreaks. However, macro-level predictive models are inefficient for guiding institutional decision-making due to uncertainty in local-level model input parameters. We present an institutional-level modeling toolkit used to inform prediction, resource procurement and allocation, and policy implementation at Clemson University throughout the Covid-19 pandemic. Through incorporating real-time estimation of disease surveillance and epidemiological measures based on institutional data, we argue this approach helps minimize uncertainties in input parameters presented in the broader literature and increases prediction accuracy. We demonstrate this through case studies at Clemson and other university settings during the Omicron BA.1 and BA.4/BA.5 variant surges. The input parameters of our toolkit are easily adaptable to other institutional settings during future health emergencies. This methodological approach has potential to improve public health response through increasing the capability of institutions to make data-informed decisions that better prioritize the health and safety of their communities while minimizing operational disruptions.


INTRODUCTION
The Covid-19 pandemic has caused major devastation and disruption globally. Since the onset of the pandemic, society has struggled to adapt to the unpredictable and changing nature of the pandemic. Institutions, including industry, health systems, and educational institutions, among others, faced a particularly di cult task of operating during Covid-19. [1][2][3][4] Many of the current public health guidelines to mitigate Covid-19 spread were undeveloped at the time such institutions reopened (e.g., pre-arrival testing for university students). 5 While disease mitigation policies implemented by governments in broad geographic regions were effective, 6 policies informed by state or county data were insu cient and/or ine cient for disease mitigation at the local level. 7,8 Population characteristics in institutes of higher education (IHE), for example, can be substantially different in terms of social networks and health seeking behavior relative to the general population. 9 Standard mitigation policies, including social distancing and masking, were not effective for preventing outbreaks in university student populations due to high social contacts and congregated housing. 10 Institutions, which have enhanced exibility and ability to implement mitigation measures tailored to their populations, have utilized predictive modeling at the local level to guide decision making throughout the pandemic. IHE implemented predictive models to inform testing strategies, mask and vaccine mandates, online instruction, and other mitigation strategies to help curb disease transmission in their student and employee populations. [11][12][13][14] Accurate models are especially useful for IHE in the United States (US) and abroad, since 1) IHE students, faculty, and staff account for 7% of the US population and indirectly impact tens of millions including families and local communites, 13 2) increased disease transmission among students due to increased social contacts and congregated living, 10 and 3) IHE are more easily able to implement mitigation policies and behavioral interventions. 13 Several predictive Covid-19 models have been developed since the onset of the pandemic for case projections and intervention evaluation in other institutional settings, 15 including healthcare facilities, 16 long-term care facilities, 17 and K-12 schools. [18][19][20] However, many of these models rely on input parameters derived from broad geographic regions which can lead to inaccurate projections for local populations. 7 When models are tailored to local populations, uncertainty in local-level input parameters including initial model states (e.g., population immunity), 21 disease transmission (e.g., vaccine protection), 9 human behavior (e.g., voluntary testing compliance), 22 and the unpredictable and changing nature of the pandemic 23 further amplify model inaccuracy. 24 While predictive models can be useful for comparing the relative effectiveness of interventions, 13,25,26 inaccurate point estimates for disease incidence can ultimately complicate institutional decision making and policy formation. 27 Accurate case projections are needed to inform institutional resource planning and procurement, including, testing kits, isolation beds, ventilators, sta ng, etc. 5,11,28 Fortunately, many large institutions have rich data sources that can directly estimate input parameters to guide predictive models. Such modeling frameworks allow institutions to make informed decisions that better prioritize the health and safety of their communities while minimizing operational disruptions.
In this study, we describe the development and implementation of a novel epidemiological modeling toolkit for institutional Covid-19 surveillance, prediction, resource procurement, and evaluation of mitigation strategies for implementation of institutional policies. This modeling framework formed the basis for Clemson University's decision-making throughout the Covid-19 pandemic. A novel feature of our toolkit is the utilization of the entire pipeline of institutional data in all stages of the modeling framework, including 1) estimation of local disease surveillance metrics, 2) statistical modeling of local disease transmission dynamics, and 3) compartment-based modeling framework for Covid-19 prediction based on input parameters estimated in 1), 2), and publicly available data; see Fig. 1. We argue that this strategy helps minimize some of the uncertainties in model input parameters presented in the broader literature, and demonstrate that this institutional-level modeling toolkit can accurately predict the number of Covid-19 cases, inform resource procurement, and evaluate the relative effectiveness of mitigation measures. Moreover, the generalized version of this (publicly available) toolkit can yield reasonably accurate predictions in other university settings. The input parameters of this toolkit are easily adaptable to other institutional settings during (respiratory) infectious disease outbreaks.

Model Structure
For each a liate subpopulation (in-state residential student, out-of-state residential student, non-residential student, faculty, staff, community), individuals are assigned into an immunity (or protection) level: no immunity, previous SARS-CoV-2 infection only, full vaccination, boosted, full vaccination with previous infection, boosted with previous infection (additional detail provided in Methods and Supplementary Text). Within each a liate/immunity level subpopulation, individuals are placed in one of the compartments detailed in Fig. 1 (additional detail provided in Methods and Supplementary Text). Details on statistical models used to estimate protection against SARS-CoV-2 infection by immunity level are provided in Methods and Supplementary Text. Additional detail on estimation of protection parameters, disease transmission and transition parameters, including those derived from scienti c literature or institutional protocol, is provided in Supplementary Text. Initial compartment states and disease transmission/transition parameters are then inserted as input parameters into the compartment-based modeling (CBM) framework. The CBM provides predictions of the weekly number of cases and infection rates, the daily number of isolated individuals, and the daily number of isolated and quarantined individuals (by a liate subpopulation). In addition, the toolkit also displays a summary of the initial states and the estimated disease transmission dynamics. A step-by-step tutorial of this publicly available toolkit is included as a supplement to this article.

Main analysis -Clemson University Analysis (Spring 2022)
There were 27,516 individuals in the main-campus population, including 22,634 students (4,853 in-state residential students, 2,265 out-of-state residential students, 15,516 non-residential students) and 4,882 employees (1,611 faculty, 3,271 staff). Also included were 17,681 from the local community. 29 The residential population was split into in-state and out-of-state, since out-of-state residential students were more likely to use university-provided housing (if SARS-CoV-2 positive) due to travel restrictions. Students and employees were subject to mandatory arrival testing and weekly surveillance testing during inperson instruction. Initial values for students and employees in each compartment are based on empirical data with adjustments for underreporting (Table S1) at the start of the prediction period (January 10, 2022). During this period, the omicron BA.1 variant accounted for 99.2% of SARS-CoV-2 cases in South Carolina. 9 Estimated student and employee disease prevalence at baseline (January 6th through 9th ) was 15.1% and 4.8%, respectively. The number of individuals in each immunity level, along with estimated protection by immunity level, is provided in Table S9. The disease reproductive number for each subpopulation was validated using empirical data from the Spring and Fall 2021 semesters and published literature (Methods and Supplementary Appendix 1). Predicted SARS-CoV-2 cases under weekly surveillance testing for students and employees during the 5-week follow-up period (January 10 -February 13, 2022) are provided in Fig. 2. Observed cases represent the total number of tests with positive results during the indicated prediction period. Predicted cases represent the total number of students and employees tested positive during the indicated prediction period. Total predicted student and employee cases (%) during this 5-week period was 4,947 (21.9%) and 891 (19.2%). Total observed cases (%) for these populations were 4,876 (21.5%) and 876 (17.9%), respectively.
Further, the percent-agreement for total detected cases was 98.6% for students and 93.2% for employees. In addition, the percent agreement for the peak number of weekly detected cases is 81.9% for students (observed N = 2,035; predicted N = 1,667) and 79.5% for employees (observed N = 308; predicted N = 245). The predicted peak for students concurred with the observed peak at Week 1 (Jan. 10-16), but the predicted peak for employees occurred a week later than the observed peak.
Observed and predicted students in isolation over the 5-week prediction period are presented in Fig. 3. Clemson University's Isolation and Quarantine (I/Q) policies were based on latest CDC guidelines. 30 We are interested in the maximum number of students in isolation, since this is directly linked to procurement of rooms. Predicted and observed peak isolation count was 1,710 and 1,881, respectively, corresponding to an agreement of 91.8%. Of particular interest is the residential student population, since this population lives in congregated housing and therefore cannot isolate/quarantine in place. Among residential students, predicted and observed peak isolation count was 673 and 649 (% agreement: 96.3%). In addition, among out-of-state residential students, predicted and observed peak isolation capacity was 264 and 194 (% agreement: 73.5%).
There was some daily variation in observed peak isolation (relative to predicted). Of note is the discrepancy between peak capacity towards the end of week 2 (predicted peak: 1,086, observed peak: 1,515; agreement: 72%). This was primarily due to daily uctuation in student testing schedules and limited weekend testing, which was not incorporated into the modeling framework.
Prior to the start of each semester, we were tasked with evaluating the impact of testing strategies on mitigating disease spread. This has been extensively studied for previous variants (prior to omicron), which have concluded that testing at least once per week is su cient for mitigating disease spread. 12,13 Here we compare the projected cases during the ve-week projection period under four different testing strategies: weekly, bi-weekly, monthly, and voluntary testing. We consider two time periods: Spring 2022 semester (omicron BA.1 variant) and Fall 2022 semester (omicron BA.5 variant).
For voluntary testing, we estimated that only 10% of total SARS-CoV-2 infections would be detected for students and 15% for employees. Results for the Spring 2022 semester presented in Fig. 4. Weekly testing led to 1.10, 1.50, and 2.57 times more detected student cases compared to bi-weekly, monthly, and voluntary testing (weekly: 4,947, bi-weekly: 4,492, monthly: 3,293, voluntary: 1,928) and 1.02, 1.30, and 1.92 times more detected employee cases compared to bi-weekly, monthly, and voluntary testing (weekly: 891, bi-weekly: 871, monthly: 688, voluntary: 463). The opposite was true for total cases (both symptomatic and asymptomatic). Here, voluntary testing led to 1.65, 1.19, and 1.06 times more total student cases compared to weekly, bi-weekly, and monthly testing (weekly: 5,669, bi-weekly: 7,859, monthly: 8,851, voluntary: 9,379) and 1.79, 1.29, and 1.10 times more total employee cases compared to weekly, bi-weekly, and monthly testing (weekly: 1,206, bi-weekly: 1,671, monthly: 1,954, voluntary: 2,153). Based on these ndings, Clemson University continued with weekly testing during the rst half of the Spring 2022 semester. While similar (relative) trends were observed when comparing testing strategies prior to the Fall 2022 semester (Fig. S1), overall predicted cases were lower under the four testing strategies. This is primarily due to the substantial increase in population immunity from the Omicron BA.1 variant, which resulted in a lower susceptible population. 9,31 Extension to other institutions and time periods We generalize the modeling framework above to obtain predictions in three other settings. The rst two projections were conducted for the University of Georgia (UGA) and Pennsylvania State University (PSU) during the Spring '22 semester. These institutions were natural choices for external validation, as both are land-grant universities with publicly accessible data on weekly Covid-19 cases. Because institutional vaccination data was unavailable, we used literaturebased estimates of vaccine protection for these populations (Table S7). The third set of projections utilized the generalized modeling framework for predictions at Clemson University during the Fall 2022 semester (omicron BA.5 variant). For UGA and PSU, we obtained the total number of students and employees in each university and the number of infections during the week prior to the prediction start (January 10th, 2022) from institutional websites and Covid-19 dashboards. 32,33 Because UGA and PSU did not implement mandatory surveillance testing, reported Covid-19 cases are from voluntary testing and therefore overall case prevalence is underreported. We adjust these estimates by an (estimated) constant to obtain the asymptomatic/ undetected infection rate at baseline (see Methods and Supplementary Appendix 1). Due to lack of information on vaccination and previous infection rates, we estimate these quantities using a combination of Clemson institutional data and data from the Centers for Disease Control and Prevention (CDC). 34 The calculation of subpopulation sizes and other details are provided in Supplementary Appendix 1.
We used our toolkit to predict the number of weekly cases and the maximum number of weekly cases for university students and employees at UGA and PSU over the 5-week period (January 10 to February 13, 2022). The results are provided in Table 1. Additional information on the initial values, estimated individuals in each protection level, and model input parameters is given in the Supplementary Materials (Table S3-4, S7, S10-11). The percent agreement for the total detected cases over the prediction period was 96.7% for UGA (observed N = 2,550; predicted N = 2,467) and 89.5% for PSU (observed N = 1,708; predicted N = 1,983). In addition, we examined the peak number of cases during the ve weeks, as this informs decisions on health resources (isolation beds, meals, medical staff, contact tracers, etc.). The percent agreement for peak weekly cases was 65.4% (observed N = 1,003; predicted N = 656) for UGA and 75.6% (observed N = 631; predicted N = 477) for PSU. In both scenarios, the predicted peak occurred one week after the observed peak.

Clemson University Analysis (Fall 2022)
In the third extension, we use the model to project the number of cases and number in isolation for the beginning of the  (Table S5 and S8, respectively).
There were 24,264 individuals in the main-campus population, including 19,082 students (4,670 in-state residential students, 2,323 out-of-state residential students, 12,089 non-residential students) and 5,183 employees (1,754 faculty, 3,429 staff). Estimated student and employee disease prevalence at baseline was 29.3% and 14.1%, respectively. The number of individuals in each immunity level, along with estimated protection by immunity level, is provided in Table   S12. Predicted Covid-19 symptomatic infections for students and employees during the follow-up period are provided in Fig. 5.
Predicted student and employee symptomatic infections (% of population) during this 5-week period was 644 (3.4%) and 183 (3.6%). Total observed cases (% of population) for these populations was 636 (3.3%) and 118 (2.2%), respectively. Figure 5 provide a weekly comparison between the projected and observed number of detected cases during the ve-week prediction period. The percent agreement for total detected cases was 98.8% for students and 64.5% for employees. In addition, the percent agreement for the peak number of weekly detected cases is 61.0% for students (observed N = 254; predicted N = 155) and 40.7% for employees (observed N = 33; predicted N = 81). The predicted peak for students occurred two weeks later than the observed peak, and for employees one week prior to the observed peak.

Input parameter sensitivity
Sensitivity of predictions to model input parameters have been extensively studied for Covid-19. 12,13,37,38 In this section, we explore sensitivity to some of the parameters unique to our modeling framework. One novel feature is accounting for protection from previous infection. We conduct a sensitivity analysis ignoring this assumption by assuming no protection from previous infection. In all settings, cases were substantially overestimated (range: 5.7-62.7%, see Table S13-S15). At Clemson University, ignoring this assumption would have led to an estimated increase in necessary I/Q capacity of 137.7% during the Fall 2022 semester, but is estimated to have had no impact on I/Q during the Spring 2022 (which is expected, since previous infection offered little protection against the omicron BA.1 variant).
In addition, there are many individuals whose infection history is unknown. We overcome this limitation by estimating the number of individuals who were previously infected by omicron but not recorded in institutional databases. If we ignore this assumption and assume that no previously infected individuals were missed, this leads to substantial overestimation in the number of predicted cases (range across scenarios: 64.2-343.0%, see Table S13-S15 At multiple periods throughout the pandemic, this toolkit was used to inform the removal of mitigation measures, including social distancing requirements, mask mandates, and mandatory testing. Because it is di cult to model the precise impact of a masking or social distancing mandate, we instead compared predicted cases under two scenarios: strong effect of the mitigation measure versus no effect of the mitigation measure. For example, our team was tasked with evaluating the impact of the classroom mask mandate mid-way through the Spring 2022 semester (after the omicron BA.1 wave had resided). To evaluate sensitivity of model predictions to changes in mitigation measures, we incorporated six daily time steps (4 hours each) into our model. Under the reference setting (corresponding to 4 weekday time steps), which was assumed to represent non work or school hours, we assumed minimal contact between students and employees or community members. 13 During class hours (1 weekday time step) and work/study hours (1 weekday time step), we assumed increased contact between students and faculty, but decreased rates of transmission. Weekend time steps assumed increased transmission rates and higher contact rates between students and employees with community members. Transmission rates across time steps were calibrated to correspond to reference transmission levels (on average). Full detail on the contact network matrix and transmission rates by time step are provided in Supplementary Appendix 1.
Assuming masks decrease disease transmission by 50%, 39 we conservatively assume an absence of a mask mandate will double transmission during the classroom time step. During the rst 5 weeks of the Spring 2022 semester, removing the mask mandate would have led to an estimated increase of 171 student cases and 119 employee cases. During the rst 5 weeks of the Fall 2022 semester, implementing the mask mandate would have led to a decrease of 15 student cases and 9 employee cases. Negligible differences in Fall 2022 are not surprising given that the majority of high-density social interactions occur outside of the classroom. Since Covid-19 prevalence was relatively low compared to previous states of the pandemic and a high-majority of the population had protection from previous infection or vaccination, a mask mandate implemented during a period of the day in which social contact was reduced would have minimal impact on overall disease spread.
Our results were not overly sensitive to the choice of contact network structure. To assess sensitivity to assumptions of contact network, we increased contact rates between students and employees/community members by 25%. This led to a decrease of 21 student cases and an increase of 13 employee cases in Spring 2022 and a decrease of 6 student cases and an increase of 3 employee cases in Fall 2022.

DISCUSSION
The modeling framework presented in this study was directly used to inform resource allocation and decision making around both implementing, and removing, mitigation measures at Clemson University beginning in the Fall 2020 semester. Early versions of this modeling framework helped inform the number of Covid-19 testing kits needed for arrival and surveillance testing strategies, phased reopening strategies, and the number of necessary isolation/quarantine rooms prior to reopening in the fall of 2020. 5,11,12 Due to the changing nature of the pandemic, including added protection from previous infection, 40 vaccination, 41 and the introduction of new SARS-CoV-2 variants which altered disease transmission dynamics, 9,42 our toolkit was continuously modi ed and calibrated to evaluate effective testing strategies in future semesters.
Beginning in summer of 2021, this toolkit was also used to scale back testing strategies and other mitigation measures that were projected to have a small impact on disease spread. For example, the weekly Covid-19 testing mandate for student and employee populations was not predicted to have a substantial impact on disease spread during summer 2021 due to strong protection from vaccination and previous infection combined with low disease prevalence.
Findings of reduced impact of mitigation measures during periods of low disease prevalence in IHE settings are consistent with other settings. 43 The testing mandate was subsequently removed during this time, but reimplemented at the start of the Fall 2021 semester as the Delta variant began circulating. 41 The weekly testing mandate was again removed after the omicron (BA.1) wave had subsided in mid-spring of 2022.
Utilizing a contact matrix that broke down social contact patterns and disease transmission by time of day, day of week, and between student, employee, and community populations, we were able to evaluate sensitivity to additional mitigation measures including on-campus social distancing and mask mandates.
For example, we projected that social distancing policies had little impact on overall transmission rates due to the majority of social interactions, and hence disease transmission, occurring off campus or in residential halls. Similarly, the toolkit showed that when disease prevalence is low and protection in the population is high, classroom mask mandates no longer had a substantial impact on overall cases due to low adherence to masking off-campus (where the majority of transmission occurs).
Utilizing the entire pipeline of Clemson Institutional Data, our toolkit was able to predict cases with high accuracy (students: 98.6%, employees: 93.2%). Furthermore, incorporating input parameter estimates based on Clemson data yielded high prediction accuracy for total Covid-19 cases at other institutions (UGA: 96.7%, PSU: 89.5%). Lower prediction accuracy for PSU relative to UGA may be explained by the relatively closer demographic similarities between Georgia and South Carolina. When replacing institutional-level estimates of disease transmission parameters with literature-based estimates, the modeling toolkit still yielded fairly high predictions for the omicron BA.5 variant during the Fall 2022 semester at Clemson University for students (accuracy: 98.8%) but overestimated total employee cases (accuracy: 64.5%).
In addition to predicting the total number of cases, the toolkit was reasonably accurate in predicting the maximum number of isolations at Clemson University during the Spring 2022 semester (90.9% accuracy) and Fall 2022 semester (79.5% accuracy). At Clemson University, this had important implications for procuring su cient isolation/quarantine rooms between Fall of 2020 through Spring of 2022. Based on these predictions, the university procured an offcampus hotel that could house over 800 students.
Due to unavailability of isolation/quarantine data at other institutions, we predicted the peak number of weekly cases and the timing of the peak as a surrogate for total isolations each week. Prediction accuracy ranged from 79.9-83.3%. While reasonable for model-based predictions, the model underestimated the maximum number of weekly infections by 17-20%. Furthermore, the predicted timing of the peak was off by one week. However, this has little implications for decision making as isolation/quarantine rooms must be procured well in advance.
One of the biggest factors leading to more precise predictions was the ability of the modeling toolkit to accurately estimate initial model states and protection from previous infection. In particular, there are a substantial number of individuals in this population with unrecorded previous infections, which has a substantial impact on predictions in IHE 13 and other settings. 44 Speci cally, we showed that ignoring these features leads to underestimating the amount of immunity in the population and thus substantially overestimating the number of infections.
Similar to other studies conducted prior to introduction of the Omicron variant, we found that high-frequency testing was effective in reducing SARS-CoV-2 transmission. 12,13 This nding was consistent throughout each semester despite the introduction of more transmissible variants and the introduction of effective vaccinations, 41 as the impact of higher transmission was offset by increased protection in the population. 40,41 However, the introduction of the omicron variant that plagued the nation in early 2022 complicated selection of optimal testing strategies, since increased disease transmission and lower vaccine protection 9 reduced the effectiveness of weekly testing strategies relative to previous variants. While institutions could theoretically increase the frequency of testing, this would have required procuring additional testing kits, lab equipment, and personnel in a relatively short time period. Without su ciently scaling up in a timely manner, which was unrealistic for many institutions in the month between introduction of the Omicron variant and the start of Spring 2022 semester, an increase in frequency of testing would have caused a signi cant lag in test diagnostics, thus allowing infectious individuals to transmit the disease for a longer period of time and potentially reducing the effectiveness of the testing strategy. 45 Extension to other institutional settings Our modeling framework can be directly applied to other institutional settings. Large health care systems or hospitals are the most natural setting for extension, since such institutions are both impacted by, and required to respond to, health emergencies. 16 Furthermore, such institutions have agency to implement their own policies and presumably have access to most, if not all, of the necessary data sources. Even without the entire pipeline of institutional data, our modeling framework is fairly accurate for external predictions in IHE settings through extrapolation of Clemson institutional data or through use of publicly available CDC/Census data in conjunction with literature-based estimates for input parameters. Our modeling toolkit can serve large workforces and other private or public institutions, including K-12 schools, requiring updates to initial state input parameters that re ect subpopulations in each institution.
However, for each institutional setting, the current IHE-based contact network matrix would need to be updated to re ect reasonable assumptions for that institution. Importantly, future adaptations of this framework may bene t from leveraging digital traces to estimate contact networks and transmission. [46][47][48] Extension to other diseases Our proposed toolkit is readily adaptable to other respiratory infectious diseases. This would require data sources relevant to the disease of interest or literature-based estimates. The estimation framework for model input and disease transmission parameters, and compartments in the prediction framework, would remain the same. For non-respiratory infectious diseases, additional modi cations to the compartments would also be needed.

Limitations
Our proposed modeling framework faces many of the limitations shared by other modeling studies. First, the high prediction accuracy of our toolkit does not imply that estimates of model input parameters and disease transmission parameters are necessarily accurate. Due to the large number of parameters, there are likely several reasonable combinations of parameters that yield similar predictions. This can important implications model predictions given strong sensitivity to input parameters. 13 In our framework, we attempted to minimize the impact of parameter uncertainty through estimation of in uential model parameters using over 1 million data records, internal validation, and external validation through comparison to estimates in the published literature. As an extension to this modeling framework, a stochastic component can be incorporated to provide credible intervals for predicted point estimates in order to account for uncertainty in model input parameters. 13 Additional limitations of our modeling framework include the simplifying assumptions often made in compartment-based modeling, including homogeneity of input parameters within each subpopulation, uniform transmission rates over infectivity period that do not vary by days since infection or severity of infection, and assuming the community is a homogeneous population. To reduce the impact of homogeneous populations, we split the populations into subpopulations including non-residential and residential students (both in-state and out-of-state), faculty, staff, and community. The contact network structure for these subpopulations was based on reasonable approximations from existing literature and input from university students, faculty, staff, and administrators. However, validation of the proposed network structure is not feasible due to parameter identi ability issues previously discussed. While model predictions were not overly sensitive to the choice of contact network structure in the IHE setting of this study, such features may not translate to other institutional settings.
Due to underreporting of booster doses at Clemson University, use of Clemson vaccination data to de ne protection levels yields 1) a boosted group containing only a fraction of the individuals receiving a booster dose and 2) a fully vaccinated group containing a mix of fully vaccinated and boosted individuals. We therefore supplemented analyses based on Clemson vaccination data with CDC-based estimated, which yielded similar results (need to make sure we show this in results or appendix). Given the population-averaged nature of compartment-based models, this nding is not surprising given the use of institutional data to estimate both vaccine protection and vaccination groups. Vaccine protection is estimated from this mixed population and therefore represents a weighted estimate of vaccine effectiveness in fully vaccinated and boosted individuals, which limits the downstream impact of misclassi cation on predictions.
However, prediction accuracy may not translate to future waves of the Covid-19 pandemic. For example, estimation of population-level immunity from previous infection will become more di cult given the decreasing in testing or use of at-home testing kits. 49,50 One potential solution in the absence of reliable data or estimation is to simplify the model through merging of compartments. 24 For example, merging asymptomatic and symptomatic infections into single infectious compartment, merging vaccination groups, or merging previously infected individuals into the reference compartment. While such a shift does not directly mimic the natural course of disease progression, reasonable predictions can still be obtained given that compartment-based models are populationaveraged models to begin with. Studies suggest that in the absence of reliable data for model input parameters (including initial states and disease transmission/ transition parameters), this strategy will result in improved prediction accuracy. 24,51 Even if prediction accuracy is reduced, previous studies have shown that evaluation of mitigation measures can be robust to variation of model input parameters. 12,13 Conclusions The institutional modeling framework developed in this study is informative for disease monitoring and projections, procurement and allocation of resources, and intervention implementation, and the publicly available modeling toolkit can be directly used to guide institutional-level decision-making. Covid-19 will unlikely be the last pandemic in our lifetime. It is very possible that high impact pathogens, including coronaviruses and in uenza A viruses, will emerge and reemerge. 52 The methodological approach presented here advances the eld of public health preparedness and response by improving the ability of institutions to make data-informed decisions that better prioritize the health and safety of their communities while minimizing operational disruptions. Institutions must therefore be prepared and ensure that proper data collection and processing protocols are in place. In the event of a future respiratory infectious disease outbreak, our proposed modeling framework can easily be adapted to inform decision-making in large institutional settings.

METHODS
All methods were carried out in accordance with relevant guidelines and regulations. All experimental protocols were approved by Institutional Review Board of Clemson University (IRB # 2021-043-02). Informed consent was waived for this study; students consented to being tested and voluntarily uploaded vaccination information, and de-identi ed data was for these analyses.  Individuals are considered as having no protection from vaccination if they are either unvaccinated or only received one dose of an mRNA vaccine.
During the Fall 2021 semester, the university created a Covid-19 vaccine upload toolkit and provided strong nancial incentives to individuals uploading proof of complete vaccination. While data on whether an individual received full vaccination was likely captured with high accuracy, 41 data on the number of individuals with a booster dose is subject to underreporting. 9 Therefore, the fully vaccinated group in the compartment-based modeling framework likely contains a mixed population of fully vaccinated and boosted individuals. 9 Because estimated protection for the fully vaccinated group is based on this population as well, the resulting downstream bias in model prediction is expected to be minimal. We assess sensitivity to this assumption by replacing institutional level estimates of the number of boosted individuals for each population with CDC demographic data of vaccination rates by age group and replace institutional level estimates of protection with literature-based estimates.

Isolation/Quarantine
Student isolation and quarantine was tracked using a management system, including the software Atlassian Jira. 53 A description for the data application and collection processes are illustrated in McMahan et al. (Figure S1). 54 Ethical review for this study and obtained by Institutional Review Board of Clemson University (IRB # 2021-043-02).
Additional data sources are provided in Table 2.

Compartment-based model
We developed a metapopulation compartmental model that projects weekly SARS-CoV-2 cases, symptomatic cases, and daily isolations and quarantines.
This model generalizes the metapopulation SEIR model. 55 A diagram of the dynamics across all compartments is presented in Fig. 1.
Each compartment comprises of six sub-populations --in-state residential students, out-of-state residential students, non-residential students, faculty, staff, and community. In addition, each compartment is indexed by , representing each of the following six protection levels: : Within each protection level, individuals are assigned into one of the following compartments at baseline: susceptible individuals (S j ), individuals exposed to the disease but not yet infectious (E j ), symptomatic ( ) or asymptomatically/mild ( ) infectious, exposed or infectious individuals testing positive ( and , respectively), individuals in isolation housing (H j ), quarantine for close-contacts of infected individuals who did not contract disease and remain susceptible ( ), quarantine for close-contacts of infected individuals who were exposed to the disease ( ), and recovered (R j ) for all individuals no longer infectious or susceptible to the disease during the follow-up period. Projections were carried out using the forward Euler method. Each day is divided into six time-steps, four hours each. Details of all model equations of the forward Euler method are provided in Table S1.
Since the ve-week projection period is short, we assume that there is no transition from one protection level to another during the projection period.
Speci cally, there is no transition from unvaccinated to fully vaccinated or from fully vaccinated to boosted. For instance, unprotected susceptible individuals In addition, we also assume that symptomatic individuals are voluntarily tested and automatically moved to isolation housing. On the other hand, asymptomatic individuals are only tested under mandatory testing policies. The implication is that under voluntary testing strategy the detected cases are all symptomatic, while under mandatory testing the detected cases include both symptomatic and asymptomatic cases.

Transmission
Transmission is governed by the basic reproductive number (R 0 ), contact matrix, and infectivity period. For the no immunity group, R 0 is computed by a liation subpopulation for each SARS-CoV-2 variant based on scienti c literature and is internally validated using institutional data. Transmission in the no immunity group is modeled by the parameter β 0 = R 0 × ϕ, where 1/ϕ is the infectivity period. 56 For the other immunity groups j = 1,2,…,5, the transmission parameter is β j = β 0 × (1 -hr j ), where hr j is the estimated protection for level j (estimation discussed in next section). These parameters, along with the contact network matrix, are adjusted to re ect time-dependent changes within and between subpopulations. These time steps correspond to time of day and day of week in order to re ect varying social engagements, including time spent in class, work, and weekends.
R 0 for each a liation in the Spring '22 analysis is validated using testing data collected during the Fall '21 semester. Holding all other parameters constant, we searched for the optimal R 0 that minimizes the mean squared error between the projected cases and the observed cases in Fall '21.

Estimated protection
In the main analysis (Clemson University Spring '22), we estimated the protection r j due to vaccination and/or previous infection using a Cox proportional hazard model. The outcome was the testing results during the pre-arrival testing period prior to semester start between December 31, 2021 and January 9, 2022. Information of vaccination status and previous infections prior to January 9, 2022 was collected from institutional data. To account for the differences between students and employees, we tted two separate models.
For the i th subject, the hazard function is given by where V i is an indicator for fully vaccinated without booster, B i an indicator for boosted, and P i an indicator for previously infected. Based on preliminary analyses, the interaction between vaccination status and previous infection is not statistically signi cant (student P-values: , ; employee P-values: , ). Hence the effects due to vaccination and due to previous infection are additive.
For protection level j = 1, …, 5, the estimated protection is given by 1 -hr j , where hr j is the hazard ratio relative to the unprotected individuals. Speci cally,

Contact matrix
The interaction among the six subpopulations (in-state residential student, out-of-state residential student, non-residential student, faculty, staff, and community) is modeled via the contact matrix C. Individuals in each protection level j transition from the susceptible to the exposed compartment at a rate of where I tot is the total number of infectious individuals, N is the subpopulation size. Following Lloyd and Jansen (2004), C is a matrix, where the component C kl represents the proportion of individuals in subpopulation k making contacts with individuals in subpopulation l in each time step, with k, l = 1, …, 6 denoting subpopulations in the order of in-state residential student, out-of-state residential student, non-residential student, faculty, staff, and community.
To account for different interaction patterns across different time periods of the day and day of the week, the contact matrix C assumes different structures during (i) classroom time (weekday, time step 1), (ii) work time (weekday, time step 2), (iii) after hours (weekday, time step 3-6), and (iv) weekend. Full speci cation of the contact matrix is presented in the Supplementary Appendix 1.

Initial model states
Here we give an overview of the estimation procedure for initial model states in the main analysis. Details are provided in the Supplementary Appendix 1.
Brie y, the number of currently infected individuals are estimated by the total number of infections within 5-days prior to the follow-up period. Under mandated pre-arrival or arrival testing, infections are divided between the exposed, asymptomatic infectious, and symptomatic infectious compartments. The distribution of infections to each of these compartments is based on the symptomatic infection rate, test sensitivity, and length of the infectivity period for each compartment. The number of individuals in isolation/quarantine is estimated based on the total number of individuals with an exit date from isolation/quarantine after the prediction start date (infected individuals exiting form isolation/quarantine prior to start of follow-up are considered recovered if within 90 days of follow-up).
The recovered compartment consists of all individuals infected between 5 days and 90 days prior to follow-up. The Spring 2022 and Fall 2022 analyses are subject to underreporting of both previously infected and recovered compartments due to shifts in university testing strategy (from weekly testing to voluntary testing). To account for underreporting, we estimate the number of unrecorded infections and add them to previously infected compartments (if > 90 days since infection) or recovered (if ≤ 90 days since infection). 40 In the community, the proportion of individuals in each protection level is assumed to be the same as the employee subpopulation at Clemson University.
Initial values for the testing, isolation and quarantine compartments are all set to 0. The community baseline infection rate, baseline recovery rate, and the proportion of additional recovered individuals can all be customized in the toolkit.
Extension to other settings The estimation of initial states for UGA and PSU has several major differences compared to the main analysis. First, from the university dashboard, we do not have su cient information of the full vaccination rate, the booster rate, the proportion of the previously infected, or the recently recovered. For other institutions, we estimate the missing information using a combination of data collected by Clemson University and data provided by the Centers for Disease Control and Prevention (CDC). The calculation of subpopulation sizes and other details are provided in the Supplementary Appendix 1. Second, the reported positive cases during the week prior to the prediction start are based on results from voluntary testing, as opposed to mandatory arrival testing in the main analysis based on Clemson University. These cases are assumed to be , the symptomatically infectious at the baseline. The initial in the exposed compartment is given by and the initial in the asymptotic infection compartment is given by , where 1/σ, 1/γ, and 1/ ϕ are the mean incubation time, mean symptomatic infectious time before isolation, and mean asymptomatic infectious time.

Output metrics
We now describe the output metrics in the Toolkit and the associated statistical methods. The Toolkit displays the projection of the weekly symptomatic SARS-CoV-2 cases and the weekly total cases. The weekly cases are provided in two versions: (1) residential students, non-residential students, faculty, staff; and (2) students, employees.
In addition, the Toolkit also displays the projected daily number of university students and employees in isolation housing or quarantine. The projected isolation and quarantine for students includes numbers for out-of-state residential students, all residential students, and all students.
Daily and weekly symptomatic cases. Daily symptomatic cases under this framework consist of two groups of individuals, those who are detected at the beginning of the day, and those who are isolated at each time step of the day. Let be the time step in hours and be the time step in days, so that is the number of time steps per day. The number of new symptomatic cases on day t is ( 1 ) where p is the daily testing proportion, se I is the testing sensitivity for symptomatic infections, and 1/γ is the mean time of symptomatic infection before isolation. Weekly symptomatic cases are computed by aggregating the daily symptomatic cases over 7 days.
Daily and weekly detected cases. Daily detected cases include the daily symptomatic cases in Eq. (1), the daily detected asymptomatic cases, and the daily detected exposed individuals. where se E is the testing sensitivity for the exposed individuals. Weekly detected cases are computed by aggregating the daily detected cases over 7 days.
Total cases. Daily new cases on each day are calculated via the difference in the susceptible compartments between day t-1 and t. The number of new cases on day t is given by ( 3 ) Weekly new cases aggregate daily new infections over 7 days. Note that the total cases include both detected and undetected cases.
Daily isolation. The number of isolations on day t is the total number of individuals in all isolation compartments, i.e., .
Daily isolation and quarantine. The number of isolations and quarantine on day t is the number of individuals in all isolation/quarantine compartments, i.e., .

Declarations
Acknowledgements: We thank Clemson University's Computing & Information Technology department for their role in data procurement, management, and protocol development. We thank the Clemson University's administration, Emergency Operations Center, REDDI Lab, medical staff, housing staff, modeling team, and all other providers who helped implement and manage SARS-CoV-2 testing and other mitigation measures at Clemson University throughout the Covid-19 pandemic. We would like to thank Dr. Christopher McMahan for his role in the development of the preliminary models used in this study and Dr.
Delphine Dean for her role in collecting the vast majority of SARS-CoV-2 testing data used in this study.
Funding: This project has been funded (in part) by the National Library of Medicine of the National Institutes of Health under award number R01LM014193.
Clemson University provided salary support for Z.M. and L.R. for consulting and modeling work pertaining to development and evaluation of public health strategies (project #1502934). The content and decision to publish is solely based on the authors of this study and does not necessarily represent the o cial views of the National Institutes of Health or Clemson University.
Author contributions. Z.M. and L.R. wrote the rst draft of this study and reviewed the nal draft. Z.M. conducted the statistical analyses and mathematical modeling in this study and developed the publicly available toolkit accompanying this study. and L.R. oversaw data collection and processing. L.R.
conceptualized and supervised the study and oversaw project administration.
Competing interests: The authors declare that they have no competing interests.
Inclusion & Ethics: All roles and responsibilities were agreed amongst collaborators prior to conducting research. We have taken relevant national and international research related to our study into account in citations.  Modeling framework. The modeling framework of the toolkit includes estimating local disease surveillance metrics, statistical modeling of local disease transmission dynamics, and compartment-based modeling framework for Covid-19 prediction based on estimated input parameters and publicly available data.
The observed employee cases over the ve weeks were 308, 264, 160, 90, and 54, respectively (total observed cases = 876; % agreement = 93.2%). The % Agreement is calculated as min(O ij ,P ij )/max(O ij ,P ij ), where O ij and P ij are the observed and predicted Covid-19 cases in week i for subpopulation j.