3.1 Study Characteristics
The process of literature selection is shown in Figure 1. A total of 151 trials that might meet the requirements were found by reading titles and abstracts; 130 trials were excluded through intensive reading and comparison of inclusion criteria. Studies were excluded for various reasons, mainly due to the involvement of physiotherapists, incomplete data, or the use of other equipment and techniques. Two papers were published in duplicate, and we chose one of them. One was found to meet the inclusion criteria through reference. A total of 22 full-text studies were included in the analysis[35-56]. There are a variety of forms of physical exercise, most of which are dance and Tai chi. Exercise interventions ranged from 4 weeks to 12 months, with 12 weeks being the dominant treatment. The frequency of training was 1-5 times per week, but 2-3 times was the majority. Each session lasted 50, 60, or even 90 minutes. The research information of all included studies is shown in Table 2.
3.2 Methodological Quality
The quality assessment of methods for risk of bias in the Cochrane Collaboration showed that the methodological quality of the included trials varied widely. This is unlikely to be possible with subject blindness, and exercise outside the intervention cannot be controlled. Twenty-five studies (92.59%) did better with assignment hiding, but only 18 studies (66.67%) reported the generation of random sequences. Evaluator blindness was not reported in 9 trials (33.33%), and patients could not be blinded due to the nature of the exercise intervention. Most trials reported data loss and subjects dropping out. The risk of publication bias and other biases was low. The overall risks included in the studies and the risk assessment for each study are shown in Figures 2 and 3.
3.3 Effects of exercise
3.3.1 Outcome of UPDRS-III
The results of UPDRS-III were reported in 18 of the 27 included studies. Figure 4a shows that the pooled effect estimates show that exercise has a positive effect on UPDRS-III scores (MD= -5.83; 95% CI, -8.29 to -3.37; P <0.00001). However, high heterogeneity was observed in the analysis (P < 0.00001; I² = 93%). We did not find that eliminating one or more studies significantly changed overall heterogeneity. Two studies showed no improvement in scores, one[56] in which the exercise group scored worse than the control group before intervention and one[36] in which the scores were not statistically significant. However, there is no good reason to exclude these two studies from the analysis. From the funnel plot (Figure 5a), eight items fell outside the 95% confidence interval. We would attempt to further analyze it in subgroup analysis.
3.3.2 Outcome of TUG
The results of TUG were reported in 17 of the 27 included studies. The TUG scale is one of the scales widely used to assess a patient's ability to transfer. Figure 4b shows that the combined effect estimate suggests that exercise improves the TUG score (MD= -2.22; 95% CI -3.02 to -1.42; P <0.00001). Figure 4b shows that high heterogeneity was observed in the analysis (P < 0.00001; I² = 84%). According to the funnel plot (Figure 5b), many studies fell outside the 95%CI interval, no one or each type of study was found to change the funnel plot, and we would attempt to further analyze it in subgroup analysis.
3.3.3 Outcome of UPDRS
URPDS can reflect the degree of PD. However, the results of UPDRS were reported in only 6 of the 27 included studies. Figure 4c shows that the combined effect estimates indicate that exercise is beneficial in terms of the UPDRS score (MD= -7.80; 95% CI -10.98 to -6.42; P =0.02). Figure 4c shows that high heterogeneity was observed in the analysis (P = 0.02; I² = 60%). From the funnel plot (Figure 5c), there was some bias between all the studies.
3.3.4 Outcome of BBS
The results of BBS were reported in 10 of the 27 included studies. Of course, a few other scales were used to assess balance function. Figure 4d shows that pooled effect estimates indicate the beneficial effects of exercise on balance. There was a helpful effect on the BBS (MD= 4.52; 95% CI, 2.72 to 5.78; P = 0.002). Figure 4d shows that high heterogeneity was observed in the analysis (P = 0.002; I² = 66%). From the funnel plot (Figure 5d), there was some bias between all the studies.
3.3.5 Outcome of 6 MWT
The results of the 6 MWT were reported in 6 of the 27 included studies. Figure 4e shows that the estimated pooling effect indicated improvement at the 6 MWT (MD =68.81; 95% CI, 32.14 to 105.48; P <0.0001). These results confirm that exercise can improve the motor capacity of patients. Figure 4e shows that severe heterogeneity was observed in the analysis (P < 0.0001, I² = 83%). From the funnel plot (Figure 5e), there was some bias between all the studies.
3.4 Optimal parameters of exercise
We performed a subgroup analysis and detailed discussion of outcomes involving more than ten studies. The results of the subgroup analysis of the effects of different exercise intensities on exercise are shown in Table 3.
3.4.1 Training frequency
Subgroup analysis of the frequency of weekly exercise showed a significant difference in the effect of exercise on UPDRS-III scores (P =0.09, I² = 57.6%) and TUG (P<0.00001, I² = 92.4%). Two and three times a week had significant effects with MD of -6.65 (95%CI: -10.19 to -3.11) and -3.21 (95%CI: -5.98 to -0.44) on UPDRS-III scores, respectively. Two and three times a week had significant effects with MD of -0.93(95% CI: -2.53 to 0.66) and -5.70 (95%CI: -6.33 to -4.77) on TUG scores, respectively. Twice a week was the most common and effective frequency of exercise in terms of UPDRS-III scores. There was heterogeneity among subgroups in terms of TUG scores, while heterogeneity was significantly reduced when we removed the four-times-a-week subgroup ( P =0.46, I² = 0%.).
3.4.2 Duration of each session
Each training duration includes 50min, 60mim, and 90min. We divided them into three subgroups. In the comparison between subgroups, we found no significant difference between subgroups in the effect of the duration of each session on UPDRS-III (p=0.6; I²= 0%) for all tests between subgroups. However, a 60-minute exercise session is the most common choice. The time of 60-minute exercise per session had a significant effect (MD = -5.82; 95% CI, -8.75, -2.89). There were no differences among the subgroups in TUG (P = 0.89; I²= 0). The time of 60-minute exercise per session had a significant effect (MD = -1.96; 95% CI, -2.79, -1.13) on TUG scores. The frequency of the week affects motor symptoms and may also be related to the area of evaluation.
3.4.3 Duration of exercise per week
Because of the different training times, the total training time of a week is more grouped. The weekly exercise duration included in the study was divided into 60 minutes, 120 minutes,150 minutes, 180 minutes, and >180 minutes subgroups. The difference between subgroups was not significant on UPDRS-III scores (P =0.24, I² = 29.1%) and TUG (P=0.36, I² = 6.3%). Between 120 and 180 minutes of total exercise time per week was the most selected parameter. We will also analyze the relationship between the total time of one week and efficacy in the subsequent Meta-regression analysis.
3.4.4 Exercise types
In the primary outcome measure URPDS-III, we found mainly tango and Tai chi. We selected studies with 2-3month intervention and divided them into tango and Tai Chi subgroups, including 4 and 6 studies, respectively. Figure 6 showed that there was no significant difference in the subgroup analysis (P =0.39, I²= 0%).
3.5 Meta-regression analysis
After we performed subgroup analysis, there was high heterogeneity among subgroup classes. We continued to perform univariate and multivariate regression analyses to determine the factors of heterogeneity (Table 4). We selected the average age, exercise modality (dance, Chinese martial arts, others), the number of exercises per week, the time of each exercise, the total time of the week, the length of the intervention, and the region of the participants (divided into eastern and Western countries) as covariates. Univariate meta-regression showed that the average age of participants could affect the score of UPDRS-III (β =1.242; 95% CI, -2.27, -0.206; p=0.022). After multiple regression analysis, it was found that age had no significant effect (β =-0.962; 95% CI, -2.053, 0.128; p=0.078), but region might be the influencing factor (β =-9.406; 95% CI, -17.423, -1.389; p=0.026).