This section contains two subsections, including one design for the intermediate-risk group and the other design for the high-risk group.
A flowchart that illustrates the overall design of both such trials is shown in Figure 1.
In our designs, the World Health Organization ordinal scale is used to classify COVID-19 patients into different stages based on their clinical status, but it is difficult to design a trial based on every stage. As a result, we combined groups with similar standard treatment options together. Based on the ordinal scale, patients are separated into three group: High-Risk Group (Stages 6 and 7), Intermediate-Risk Group (Stages 3, 4, and 5), Low-Risk Group (Stages 1 and 2). Patients in intermediate-risk group are treated in a similar way while the high-risk group patients need some innovative and aggressive treatment. Two independent clinical trials with two different designs are proposed for the intermediate-risk group and the high-risk group. Note that if a patient in the intermediate-risk group could not recover and progresses to the high-risk group, then that patient could be eligible for the high-risk group trial.
We discuss considerations and provide specific justifications on five important components when designing both clinical trials, including the outcome of the design, stratification, interim analysis, group ratio, and toxicity monitoring. In addition, futility stopping rules are also considered in both designs since there is no need to spend extra resource and energy if the drug is not effective.
Design for Intermediate -Risk Group:
Outcome Variables
Since a larger number of patients are expected in the intermediate-risk group, it is feasible to use binary endpoints (success or failure).
We define as the primary response variable based on the proportion of patients discharged from hospital by the 15th day. Let Y=1 indicate the success outcome if the patient is discharged from the hospital by the 15th day. Let Y=0 indicate the failure if the patient is not discharged from the hospital by the 15th day, transiting to a higher WHO scale, or dead. Here, the failure is a composite endpoint. It is the similar logic adopted from the cardiovascular trials mentioned in the background section. In our case, Y=1 is the success with probability P and Y=0 is the failure with probability 1-P. Accordingly, we calculated results based on the improvement of response rate from 40% in the standard arm to various rates (50%, 55%, 60%, 65%, 70%, 75%, 80%) in the treatment arm.
Some secondary outcome variables might also be considered. For example, the change in viral load or biomarkers of inflammation such as ferritin or IL-6, time to reduced viral load, or the number of event-free days in the hospital (event-free survival).
Stratification
For ethical reason, group sequential designs are recommended in the current setting. Since many factors could impact the outcome, stratified randomization is more suitable. For doing so, Zelen’s blocked randomization scheme with random block size (randomly selected size 4 or 6) is suggested (24). In previous work Srivastava (25) found that, with several factors appearing to affect the primary outcome of interest with their true distributions being unknown, or the possibility of causing heterogeneous treatment response among individuals in a group with unknown effect size, stratified randomization approach offered consistently better results if the effect size can be assumed to be marginally similar within each stratum.
Factors, such as age, race, sex, co-morbidity and viral load, which might impact the primary outcomes should be addressed by stratification. However, choosing the right factor for stratification is critically important. Many different issues need to be considered when choosing stratification factors. Based on the current clinical experience showing a strong dependence of COVID-19 outcomes on age, sex and diabetes, obesity and hypertension (26, 27), we consider four such factors for the intermediate-risk group: patient stage, at least one cardiovascular disease risk factor among obesity, hypertension and diabetes (Yes/No), age (<60 and ≥60 years), and sex (Male/Female).
For the intermediate risk group, we further classified patients in the three stages, those in Stages 3 and 4 and those in Stage 5 (essentially classifying patients into those who are not in ICU vs. those who are in ICU) and grouping them into two groups. This is suggested to minimize the number of strata for randomization while ensuring that the patients within each stratum are relatively homogenous. All these factors are readily identifiable; however, for defining metabolic syndrome status, it may be necessary to include other factors that are representative of a patient’s health condition. Alternatively, composite risk scores such as the Framingham, Reynolds, or GRACE risk scores may be used. Data to calculate cardiovascular risk score and/or obesity may be readily available, as patients are usually weighed and their blood pressure, cholesterol status, and diabetes are known upon admission to most hospitals. Although age is usually included in risk factor score, it could also be considered as a separate variable when the risk score cannot be calculated. Assuming that no risk scores are available, in our recommended design, we define two age groups: less than 60 years of age and greater than or equal to 60 years of age. It is generally known that patients in the intermediate-risk group are mostly elderly. Patients less than 50 years of age only count a very small percent of the patients admitted to a hospital. Therefore, the cutoff line at 60 years of age is selected to have balanced strata. Moreover, in view of studies showing that the recovery rate for males is lower than of females (28), sex should also be considered. With these four factors for stratification, there would be a total of 16 strata in the design, which makes the trial design somewhat manageable. Race is not explicitly considered, as there is no indication yet of race-dependent variations in outcome, independent of pre-existing disease burden.
Interim Analysis
For the intermediate-risk group, two interim analyses are recommended. Results are presented for no interim analysis, one interim analysis and two interim analyses. We describe design parameters at alpha=0.05 and power=90%. Because the virus is life threatening, it is important to ascertain the efficacy of the intervention as early as possible and make the drug available to this patient population as soon as possible. Without interim analysis, researchers would know the outcome of the trials only after all patients have been enrolled. If one choose to perform one interim analysis, when 50% of patients are enrolled, then using G-rho spending function with rho equals 2, one would stop the trial at the interim evaluation if the p-value of the test for comparing the two groups is less than 0.006 (29, 30). Additionally, for futility evaluation, trial would also be stopped if the p-value is greater than 0.716. Otherwise, the trial should continue, and the final analysis will be conducted, and the efficacy of the treatment should be declared only if when the p-value is less than 0.047. However, due to the insidious nature of this infection, waiting until 50% of patients enrolled to find out the result may still not be aggressive enough. Therefore, to fast track the process and make sure that the drug can be made available to those who need it urgently, we recommend two interim analyses, with first interim analysis to be performed when one-third of total patient population has been enrolled and evaluated (effective new treatment if p<0.002) with futility look at p>0.830; second look being performed when two-third of the patients are enrolled and evaluated (effective treatment if p<0.014) with futility look at p>0.298; and the final analysis when all patients are enrolled (p<0.046). Rho equals three is used in the G-rho spending function (29, 30). The choice of Rho was based on the consideration that we need to make the drug available to the patients quickly but we need to make sure that the trial is stopped early only if we have strong evidence the drug is effective and this is the reason why we chose the p-values cut-offs at interim evaluations to be somewhat conservative (making sure that there is strong evidence in favor of the drug and avoid false positive findings). To explain with an example, assume the overall sample size is 243 in which 81 belongs to the standard care arm and the rest 162 belongs to the treatment arm. At the first interim analysis, we have 54 patients (one third of 162). If p<0.002, then there is strong evidence to declare that the intervention is working, and the trial should stop right away. With this design, researchers can find out early whether the intervention works, or stop if it is causing unacceptable harm to patients by monitoring toxicities. Considering some unforeseen reasons, such as patients change their mind regarding the study after randomization, the sample size should be increased by approximately 5% with resulting total n=256. If the expected effect is somewhat smaller (such as 10%), the sample size will drastically increase (n=978), however, the monitoring rule for efficacy and futility evaluations remains the same.
Group Ratio
We calculate here sample sizes for both 1:1 and 1:2 randomizations for the intermediate-risk group. However, patients enrolled in treatment arm may be the same or twice the number of patients enrolled in the standard treatment arm. The choice of group ratio depends on the efficacy of the intervention in the pilot studies. If it is a new intervention that has not been approved by the FDA, then 1:1 randomization with block size of 4 is recommended in consideration of patient’ safety. If it is an approved procedure or drug with some preliminary data on efficacy with known toxicity profile, then 1:2 randomization with block size of 6 is recommended to ensure that if the drug is effective more patients get the advantage of being treated on the more efficacious arm.
Toxicity Monitoring
Toxicity monitoring is challenging, but necessary. For the intermediate-risk group, a toxicity rate of 25% is recommended due to the urgent need of drugs for treatment. In other words, if intervention provides even minimal improvement, then it should still be considered as there is currently no known drug that is 100% effective against COVID-19. Also note that patient may die during the treatment due to other causes. We recommend monitoring for those toxicities (Common Terminology Criteria for Adverse Events (CTCAE) Grades >2) that are related, possibly related and probably related to the drug, evaluated by the Data and Safety Monitoring Board (DSMB) (31).
Design for High -Risk Group:
Outcome Variables
The number of patients in the high-risk group at each health care facility is likely to be small. Using time to survival as an endpoint may not be ideal because the follow-up is short, and it may require a long time to enroll all patients and there would hardly be any right censoring. In other words, very few patients will survive pass the outcome evaluation time, thereby making time to survival as an endpoint to ineffective. Therefore, in our design we focused on reducing 30 days mortality rate. Note that since there is no censoring and the length of follow-up is very short in this risk group, there is almost no difference when choosing between survival or binary endpoints (32).
We define the 30 days mortality as the primary outcome. Let Y=1indicates death of patient within 30 days (failure), and Y=0 represents a person still alive on the 30th day (success). Accordingly, we calculated results based on the reduction of mortality rate from 80% or 70% in the standard arm to various rates (70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%) in the treatment arm.
Stratification
For the high-risk group, stratification is recommended as sample size is satisfied when using 30 days mortality as primary outcome. Enrolling sufficient patients within a given timeframe should not be an issue assuming a trial to be a multi-center trial.
For stratification, similar factors as discussed for the intermediate-risk group design are recommended, with some modification. We consider three factors for the high-risk group: one cardiovascular disease risk factor among diabetes, hypertension, and obesity (Yes/No), age (<65 and ≥65 years), and sex (male/female). With these three factors for stratification, there would be a total of 8 strata in the design. The reason of selecting 65 years age as a cutoff point is that in a recent study of COVID-19, mortality rate for those who received mechanical ventilation in the age of 18 to 65 years was 76.4%, and for those over 65years of age, the mortality rate was 97.2% (33). Zelen’s blocked randomization scheme with random block size (randomly chose size 4 or 6) is also recommended in this risk group (24).
In the high-risk group, we did not further classify patients based on their stages (Stage 6 and 7) as we have done in the intermediate-risk group. The reason is that the number of patients in stage 7 is likely to be small, and it is not possible to stratify based on these two stages. However, technically it is ideal to stratify patients evenly in every arm based on their stages, but that is not achievable in this case. Since we have chosen other factors for stratification, if extreme bias occurs, then stage 7 patients should be dropped, and researchers should only perform analysis on stage 6 patients with 80% power.
Interim Analysis
For the high-risk group, two interim analyses along with monitoring for efficacy and futility at overall alpha=0.05 and power=90% are recommended. The reason is that the mortality rate in these patients is high and they need some innovative treatments. For example, convalescent plasma therapy has been widely attempted among the high-risk group. However, the levels of neutralizing antibodies in specific plasma preparation are likely to vary, leading to variable outcomes. Therefore, if during two interim analyses patients respond better to plasma with specific antibody titers, then the rest of the patients could be moved to the higher quality plasma quickly, so they have a higher chance of survival. In this design, first interim analysis is to be performed when one-third of total patient population has been enrolled, completed 30 days, and evaluated (effective new treatment if p<0.002) with futility look at p>0.830; second look being performed when two-third of the patients are enrolled, completed 30 days, and evaluated (p<0.014) with futility look at p>0.298; and the final analysis when all patients are enrolled and completed 30 days (p<0.046). Rho equals three is used in the G-rho spending function based on the consideration of being more conservative in interim analyses to ensure that the treatment is efficacious and avoid the chances of falsely declaring the treatment to be efficacious, which could mean huge losses in terms of resources invested and loss of lives. Sample sizes for one interim analysis are also calculated and are given in tables 8 and 9 in the appendix for reference.
Group Ratio
For patients in the high-risk group, 1:2 randomization is recommended, because such patients are in danger and possibly have failed other treatments. Hence, they should be treated with whatever intervention available to improve their chances of survival. In addition, sample size is large enough to handle the 1:2 treatment allocation ratio when expecting a reduction of mortality rate from 70% to 55%. Estimated sample sizes for 1:1 randomization are also calculated and provided in the appendix for reference.
Toxicity Monitoring
For the high-risk group, no toxicity monitoring is necessary since mortality rate (between 70-97%) has been reported across many health care facilities. With such a high death rate, it is not necessary to look at the toxicity level. Any intervention that could increase the chances of saving a patient should be utilized, regardless of treatable toxicities. In addition, two interim analyses are built in our design to help stop the trial early if any harmful events are detected.