Search strategy
We carried out a comprehensive search for literature indexed in PubMed, EMBASE, Cochrane Library, Web of Science, and SCOPUS databases with publication dates ranging from the date of inception of each database until January 8, 2019. The search terms included “hot flashes” and “acupuncture”. Only clinical trials were included and the language was restricted to English. The detailed search strategies are described in the Supplementary Search Strategy.
Inclusion and exclusion criteria
Only studies that met the following criteria were included: (1) randomized controlled acupuncture trials, (2) subjects were pre-menopausal and post-menopausal women or patients with a history of breast cancer, (3) studies reporting the frequency of hot flashes at baseline and after treatment, and (4) if the study was a cross-over design, only data from the first period were included in order to avoid the legacy effect. Any cross-over trials that failed to report outcomes in the first stage were excluded. To reduce heterogeneity, participants with other comorbidities or interventions of acupuncture combined with other treatments were excluded.
Data extraction
Data were independently extracted by two researchers and checked by a third person for any discrepancies. A Microsoft Excel (version 16.0.12130.20232) database was used to categorize relevant information from the included studies. The information collected included literature characteristics (author, year of publication, and clinical trial registration number), trial design (grouping, sample size, treatment duration, blinding method, frequency of acupuncture treatment, and total acupuncture times), characteristics of participants (age, weight, region, frequency of hot flashes at baseline, duration of disease, and type of subjects), and clinical outcomes (frequency of hot flashes for each follow-up time point).
Data from intention-to-treat (ITT) groups were entered when available. For published per-protocol (PP) data, unreported sample sizes during treatment were calculated based on the endpoint sample size. If efficacy results were presented in graphs, the digitizing software Engauge Digitizer (version 4.1, 2002, by Mark Mitchell) was used to extract the graphical data. Data extraction errors between two independent researchers were not allowed to exceed 2%. If the error was greater than 2%, data extraction was repeated and the mean values were considered as the final results.
Risk of bias assessment
Two investigators independently extracted relevant information and assessed the risk of bias using the Cochrane Risk of Bias Tool [24]. Any disagreement was resolved through discussion with a third investigator. The evaluation items included random sequence generation, allocation concealment, blinding of participants and personnel, blinding for outcome assessment, incomplete outcome data, selective reporting, and other biases. Other biases were defined as trials in which baseline characteristics were not comparable between different treatment groups.
These studies were graded as high, moderate, or low quality based on the following criteria: (1) studies were considered to be of high quality when both randomization and allocation concealment were assessed as low risks of bias, and all other items were assessed as low or unclear risk of bias in a trial; (2) studies were considered to be of low quality if either randomization or allocation concealment was assessed as a high risk of bias, regardless of the risk of other items; (3) studies were considered to be of moderate quality if they did not meet the criteria for high or low quality.
Model building
In a previous study, we found that changes in the frequency of hot flashes varied with time and reached a plateau, which is consistent with the Emax model. The Emax model demonstrates good biological plausibility and is frequently used for modeling pharmacodynamic properties of drugs [25]. This model has two important parameters, Emax and ET50. Emax is the maximum possible efficacy that treatment can achieve. ET50 is the time point at which 50% of the maximum efficacy has been achieved, which represents the speed of the onset of treatment. The combinations of different Emax and ET50 values resulted in different time-course curves. In the current study, the Emax model was used to fit the longitudinal efficacy data of each treatment group with the Emax and ET50 values of each treatment group being obtained by Bayesian feedback.
Heterogeneity among studies led to large differences in Emax and ET50 values for different treatment groups. However, some of the differences could be explained by covariates. In the current study, factors that would potentially affect model parameters, such as age, weight, baseline frequency of hot flashes, duration of disease, blinding method, type of subjects, and number of acupunctures per week, were screened as covariates on parameters for both Emax and ET50. A detailed description of the construction of the model is shown in Supplementary.
Model evaluation
The goodness-of-fit of the final model was evaluated using diagnostic graphs. The sensitivity of the final model was evaluated using the leave-one-out cross-validation (LOOCV) method [26]. Briefly, data from one trial were sequentially dropped from the full data set and the remaining data were used to fit the final model. The parameter estimates obtained from each data set were compared to investigate the stability of the model. The final model was further validated by a visual predictive check (VPC) [27], which is commonly used to determine if a model is able to reproduce the variability and main trend of observed data. Typically, 1000 datasets were modeled using Monte Carlo simulations based on the final model parameters. The observed data were then compared with the 2.5th, 50th, and 97.5th percentiles of the simulated data to assess the predictive capacity of the final model.
Typical efficacies analysis
When a covariate was found to have a significant impact on the model parameters, the parameters were corrected for the covariate to ensure that the parameters were comparable at different covariate levels. The methodology for covariate correction is detailed in the Supplementary. A single-arm meta-analysis with random-effects model was then used to summarize the model parameters according to each type of treatment. The typical value and 95% confidence interval (CI) of the parameters of each treatment were obtained from the analysis. Based on these parameters, typical efficacy and 95% CI of each treatment at different time points were simulated 1000 times using Monte Carlo Simulation.
Software
Model parameter estimation was performed using NONMEM 7.4.1 (level 1.0, ICON Development Solutions, New York, NY, USA). Meta-analysis was performed using StataCorp (2013. Stata Statiscal Software: Release 13. College Station, TX, USA: StataCorp LP). Plotting, model simulation, and statistical analysis were performed using R software (version 3.6.0, The R Foundation of Statistical Computing, Vienna, Austria).