We will conduct a systematic review. We report this protocol according to the PRISMA-P guidelines  and provide the Checklist in an additional file. We registered this protocol with the Open Science Framework (Registration ID: https://osf.io/2kwrz).
Studies will be eligible if they are a) written in English, b) involve human participants, and c) are protocols, primary or secondary reports of planned or completed SW-CRT. Papers with their protocols will be considered as one completed original study. We will include only studies that use cluster randomisation with a minimum of two sequences and three periods or three sequences and two periods. The design may not have all clusters starting in control and ending in intervention, or may not have complete data . We will not include any restriction on interventions, comparators, and outcomes. We will consider both healthcare and non-healthcare settings. We will exclude a) individually randomised trials; b) bi-direction cross-over; c) non-randomised stepped wedge designs; or d) trials retrospectively analysed as a stepped wedge design when the study was not originally designed as a stepped wedge . We will also exclude systematic reviews, editorials, design manuscripts, and letters.
Information sources and search strategy
We will use an adaptation of three previously published search strategies of systematic reviews on stepped wedge design [7, 19, 57, 58]. We will search the following databases up to 16 October 2020: PubMed, Embase, PsycINFO, Web of Science, CINAHL, Cochrane Library and Current Controlled Trials Register. Our search terms are “stepped wedge”, “step wedge”, “experimentally staged introduction”, “one-way crossover”, "one directional crossover", "one way cross-over", “SW-CRT” as well as the 28 combinations of the terms “incremental”, “phased”, “staggered”, “staged”, “stepwise”, “step wise” and “delayed” with the terms “recruitment”, “introduction”, “implementation”, “intervention”. An information specialist will first perform the search strategy in PubMed, and then will translate it into the other databases. A second information specialist will revise the initial search strategy with the Peer Review of Electronic Search Strategies (PRESS) Tool . Our search strategy in PubMed is described in Table 2. We will report on the search process following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram . The flow diagram is presented in Fig. 2. We will also search grey literature by contacting stepped wedge design experts of relevant networks such as the Ottawa Methods Centre (OMC) at the Ottawa Hospital Research Institute, Canada , the Pragmatic Clinical Trials Unit (PCTU) in London, UK  and the National Institute of Health, USA .
We will first merge the citations identified from seven electronic databases mentioned above in EndNote software. We will then identify and remove the duplicates. Unique citations will be considered for the selection process.
Study selection process
We will complete the screening in four steps. First, we will randomly select a small sample (10%) of unique records identified. Two reviewers will independently screen these studies using our inclusion/exclusion criteria. We will assess the agreement between screeners to ensure that the eligibility screening process is reproducible and reliable. We will describe the observed proportions of articles where pairs of screeners agree or disagree on their eligibility decisions and calculate Kappa statistics . Disagreements will be resolved through discussion. If necessary, we will modify instructions and re-test another sample to improve the agreement between screeners . Second, once agreement is acceptable, the two reviewers will independently screen the remaining titles and abstracts. Third, we will obtain full-text articles for all the studies that passed the title and abstract screening and confirm eligibility. In addition, we will also examine references lists of relevant systematic reviews and eligible papers for additional papers which the search strategy would have missed. Papers with their protocols will be considered as one completed original study. The studies found not to meet the eligibility criteria will be excluded and we will document the reason for exclusion. Finally, from full text screening, we will identify SW-CRTs in which subgroup analyses were performed. We will email study authors when relevant information for the selection decision is missing or unclear. Discrepancies between the two reviewers will be resolved by consensus through discussion with a third reviewer.
We will develop a data extraction form (an Excel spreadsheet) by examining items reported in other relevant systematic reviews on subgroup analyses or on SW-CRTs [19, 40, 54, 66, 67] and developing new items specific to our review. The data extraction form will be pilot tested on a sample of 10% trials (for which full-text screening is complete) to refine the extraction items and to ensure that data will be collected consistently. Two reviewers will extract data from these trials and discrepancies will be identified and resolved through discussion. The data collection form will be improved if necessary. Once consensus is reached, the form will be finalized, and the trials will be randomly divided among the reviewers with two reviewers assigned to each trial. Reviewers will meet periodically (e.g. after every five trials have been completed) to review discrepancies and come to a consensus. If consensus cannot be obtained, a third investigator will resolve differences. Kappa statistics will be calculated to determine the agreement between the independent reviewers. We will extract data in multiple domains related to our objectives. First, we will extract information on study characteristics: first author, title, year of publication, type of article (e.g. primary report or secondary report of a completed trial), country, setting (e.g. non-health care), type of design (e.g. cross sectional), number of clusters, number of sequences, number of periods, number of participants per step, step length, trial duration, presence of subgroup analyses. Second, we will extract characteristics of subgroup variables examined: number of subgroup variables, whether subgroup variables were defined at cluster-level or individual-level, type of each subgroup variable (e.g. categorical), number of categories, how subgroup variables were measured during data collection phase (e.g. continuous), how the cut-off was justified (if categorised). We will extract detailed information about the statistical methods used to perform subgroup analyses in SW-CRTs. Third, we will consider a) methods used for the outcome on which subgroup analyses were based: outcome variable (e.g. primary outcome), type of outcome variable (e.g. continuous), unit of analysis (e.g. individual-level-analysis), distribution (e.g. binomial), statistical models (e.g. Generalised Linear Mixed Model), association measure (e.g. risk ratio), assumptions about the correlation structure (e.g. model suggested by Hussey et al. ), whether time was modelled as discrete or continuous, whether the analysis adjusted for time, whether there was any interaction between time and treatment, whether the effect of the intervention on the outcome of interest was statistically significant. We will then consider b) statistical methods specific to subgroup analyses: methods used to assess subgroup effect (e.g. test for interaction), scale on which interaction was assessed (e.g. multiplicative), whether there was a significant test for interaction, which interaction terms specifically were included in the statistical model, what statistical methods were used to adjust for multiple subgroup hypotheses testing, whether authors claimed a subgroup effect (reported treatment effect by subgroup instead of overall treatment effect) . Fourth, we will examine adherence to the most consistently recommendations suggested for subgroup analyses in general including in clinical trials (Table 1): whether a rationale for subgroup analyses was provided, type of subgroup analysis (e.g. prespecified), whether a formal test for interaction was used to assess subgroup effect, whether multiple subgroup analyses were performed and, whether a correction for multiplicity was applied when multiple subgroup analyses were performed.
As our study is a systematic review of statistical methods used to perform subgroup analyses in SW-CRTs, we will not assess quality of trials. In addition, the quality of subgroup analyses in SW-CRTs will not be assessed since there is no methodological tool designed for this purpose.
Kappa statistic will be performed to determine the agreement between reviewers for the main items. We will consider agreement as “fair” when kappa statistic values are between 0.40 and 0.59, “good” when values are between 0.60 and 0.74, and “excellent” when values are greater or equal to 0.75. Our results will be presented separately according whether they are protocols or completed original studies. We will perform a descriptive analysis in four steps. First, we will describe the characteristics of all included SW-CRTs. We will use median (interquartile range) to describe continuous variables and frequency (percentage) to describe categorical variables. We will then identify the number of completed original research (or protocols) that performed (or planned to perform) subgroup analyses, overall and in each category of study. Second, we will describe characteristics of subgroup variables examined. Third, we will identify and describe statistical methods used to perform subgroup analyses in SW-CRTs. Fourth, we will determine for each of the five items (Table 1), the prevalence of adherence of the included trials to the most consistently suggested recommendations for subgroup analyses in general including in clinical trials. Analyses will be performed using version 9.4 of SAS software.