## Study Design

This three-arm, randomized, sham-controlled trial has been approved by ethics committees
at all 9 hospitals. Eligible KOA participants diagnosed according to the American
College of Rheumatology criteria [19] are randomly assigned (1:1:1) to receive 24
sessions of electro-acupuncture, manual acupuncture, or sham acupuncture over 8 weeks.
Block randomization with random block size of 6 and 9, is stratified by study centre,
and is performed via a web-based randomization system. Superficial insertion at non-acupoints with no electric current will be used in sham
acupuncture group, which is one of the most commonly used approaches for administering
sham treatments in acupuncture trials. The nature of acupuncture means that acupuncturists are not blinded to treatment
allocation; however, patients, outcome assessors and statisticians remain masked where possible. Informed consent is obtained from each participant before randomization. The trial
has been registered with ClinicalTrials.gov (NCT03366363).

## Objectives

The objective of current study is to determine if EA and MA improve the outcome at
8 weeks in patients with knee osteoarthritis. The following two null hypotheses are
tested: there is no difference in patients’ response rate between EA group and SA group; there is no difference in patients’ response rate between MA group and SA group.

# Outcomes

## Primary outcome

The primary outcome is the response rate [20] - the proportion of patients who simultaneously
achieve minimal clinically important improvement (MCII) in pain and function domains
at 8 weeks post-randomization. The average pain over the previous week is assessed
using an 11-point Numerical Rating Scale (NRS) [21] with scores ranging from 0 to
10. The MCII in pain domain is defined as 2 points in NRS [11, 22]. The average function
over the previous week is measured using Western Ontario and McMaster Universities
osteoarthritis index (WOMAC) function subscale [23] with scores ranging from 0 to
68. The MCII in function domain is defined as 6 points in WOMAC function subscale
[11, 22]. The criteria of responder are presented in Fig 1. The response rate is also
measured at weeks 4, 16, and 26 after randomization.

## Secondary outcomes

Numerical Rating Scale [21]: an 11-point patient reported outcome measure (PROM) with
scores ranging from 0 (no pain) to 10 (worst pain).

WOMAC [23] pain subscale: a 5-item PROM with total scores ranging from 0 to 20. Higher
scores indicate worse pain.

WOMAC [23] function subscale: a 17-item PROM with total scores ranging from 0 to 68.
Lower scores indicate better physical function.

WOMAC [23] stiffness subscale: a 2-item PROM with total scores ranging from 0 to 8.
Higher scores indicate more stiffness.

Patient global assessment [24]: a 5-point Likert scale. Participants are asked how their knee symptoms were
during the past week. The answers include ‘extremely improved’, ‘slightly improved’, ‘not changed’, ‘slightly
aggravated’, and ‘extremely aggravated’.

12-item Short Form Health Survey (SF-12) [25] physical dimension: total score range
from 0 to 100. Lower scores indicate a worse quality of life.

SF-12 [25] mental dimension: total score range from 0 to 100. Higher scores indicate
a better quality of life.

NRS, WOMAC, Patient global assessment and SF-12 is measured at 4, 8, 16, and 26 weeks
after randomization. Blinding assessment is measured at 4 and 8 weeks after randomization.
Credibility and expectancy of participants are measured 5 minutes after the first
acupuncture [26]. The use of rescue medicine is also counted throughout the trial.

## Safety outcome

Adverse events are recorded throughout the trial. Based on the potential relationship
between acupuncture and adverse events, adverse events are categorized as treatment-related
or not.

## Sample size

Based on the results of a previous trial [16], the response rates of EA, MA and SA
group are assumed to be 70%, 60% and 40%, respectively. With a 2-sided significance
level of 2.5% and power of 80%, 128 participants in each group will be required to
detect a difference as small as 20% between each acupuncture group and control group.
The 2-sided significance level of 2.5% is a Bonferroni-adjusted alpha level as per
the two predefined primary comparisons: EA vs. SA and MA vs. SA. With an estimated loss-to-follow-up rate of 20%, 480 participants in the three groups
will be recruited.

# Statistical analysis

## Statistical analysis population

Full analysis set (FAS) Modified full analysis set (mFAS), per-protocol set (PPS), and safety set (SS) will be used in current trial.

Modified full analysis set FAS will consist of all randomized participants who have at least one post-baseline measurement
according to modified intention-to-treat principle. Logistic regression will be used to exam whether the data are missing at random or not [27]. If data is missing at random, multiple imputation method will be used [28]. FAS Modified full analysis set will be the primary analysis set, and all analyses will be conducted for this population
if not otherwise stated. Analyses on mFAS will provide an estimate of the effect of electro-acupuncture and manual acupuncture.

PPS will include those who complete the treatment and follow-up timely according to
protocol without major violations. Major violations of protocol will be judged during
the blinded audit of data, including but not limited to: not meeting the inclusion criteria /
meeting the exclusion criteria, receiving other treatments which might affect symptoms
of KOA during the trial, completing ＜ 20 sessions of acupuncture. PPS will be the
secondary analysis set and be used for sensitivity analyses.

Those who receive randomization and at least one session of acupuncture will be defined
as SS, which is used for safety analyses.

## General analysis principles

All data will be summarized by treatment group. Numbers (percentages) will be used
to describe categorical data. Either means (standard deviations) or medians (interquartile
ranges) will be used for quantitative data depending on whether the variables are
normally distributed or not. If not otherwise stated, the significance level will
be set at 0.05. The Bonferroni method will be used to adjust the significance level for multiple comparisons
for the primary outcome. The significance level will be adjusted for the multiple
comparisons for the primary outcome. The conclusion will be based on the analysis of primary outcome, and all secondary
outcomes will be analyzed to support the primary analysis. All analyses will be carried
out using SAS 9.3 (Cary, NC).

## Descriptive analyses

The number of participants screened, excluded, randomly assigned to each group, interviewed
at each follow up, and analyzed will be summarized using a flow diagram recommended
by CONSORT [27 29] (Fig. 2). Reasons for the losses to follow-up and withdrawals will also be listed
by treatment arm.

Demographic characteristics and clinical outcomes at baseline will be presented in
Table 1. When testing differences among the three groups, either one-way analysis of variance
(ANOVA) or Krusal-Wallis one-way ANOVA (if normality is violated) will be used for
continuous variables. Chi-square test or Fisher exact text will be used for categorical
variables. Missing data of baseline characteristics will not be imputed. Differences
among the treatment groups at baseline will not be statistically tested.

## Analysis of primary outcome

For the analysis of primary outcome, the response rates of the three groups at 8 weeks
will be calculated and the Z-test for comparisons of proportions will be used with
FAS. The missing data at 8 weeks will be imputed using the baseline value. There will be two comparisons. The first comparison is the one between electro-acupuncture
group and sham acupuncture group. The second comparison is the one between manual
acupuncture group and sham acupuncture group. The significance level will be adjusted
at 0.025 for the multiple comparisons using Bonferroni method.

## Analysis of secondary outcomes

For NRS score, comparisons among three groups will be assessed by mixed-effect model
with repeated measurement (MMRM) analysis using NRS scores at all follow up time points
as dependent variable, treatment as main factor, baseline value as a covariate. We set the model as

```
y
i
j
=
α
+
u
i
+
β
1
+
β
2
+
ε
i
j
=
α
+
u
i
+
β
1
∙
t
i
m
e
i
j
+
β
2
∙
t
r
e
a
t
i
+
ε
i
j
=
α
+
u
i
+
β
1
∙
t
i
m
e
i
j
+
β
2
∙
t
r
e
a
t
i
+
ε
i
j
, where
α is total average,
u
i
is unknown random effect represented subject-specified effect,
β
1
and
β
2
are unknown fixed effect represented time and treatment effect, respectively. Set
the covariance matrix G is unstructured, and
u
i
~
N
0
,
G
. The random error
ε
i
j
~
N
(
0
,
R
i
). The MMRM for secondary outcome will be handled by PROC MIXED (SAS). The estimators
of unknown parameters will be calculated by expectation maximalization algorithm.
We expect that the expectation maximalization algorithm will converge with the 480
sample sizes in three groups and a single random intercept. Meanwhile, if non-convergence
does happen, we will consider strategies such as correcting initial value, changing
random effect, or using other analysis method like generalized estimating equations.
Also, we will test the estimators or models based on likelihood test, Bayesian information
criterion methods. The modified MMRM added the center effect and time*treatment effect
will be presented in Sensitivity analysis. The modified MMRM as follows:
y
i
j
=
α
+
u
i
+
β
1
+
β
2
+
β
3
*
t
r
e
a
t
i
+
β
4
+
ε
i
j
=
α
+
u
i
+
β
1
+
β
2
+
β
3
*
t
r
e
a
t
i
+
β
4
+
ε
i
j
. Where
α
,
u
i
,
β
1
,
β
2
,
ε
i
j
defined as above,
β
3
,
β
4
are unknown fixed effect. The same approach will be used to analyze WOMAC pain subscale, function subscale,
and stiffness subscale, and SF-12. If there is a normality violation in the continuous
variables, a transformation will be performed before the comparison. Chi-square test
will be used for patient global assessment. These outcomes will be shown in Table
2.</p>
```

## Safety analyses

Based on the potential relationship between acupuncture and adverse events, adverse
events are categorized as treatment-related or not. Acupuncture-related adverse events
will be summarized by group and compared using Chi-square test (or Fisher exact test).

## Blinding analyses

Kappa analysis will be used to determine whether participants correctly guessed their
group assignment at a higher rate than would be expected by chance.

## Additional analyses

Another three schemes to deal with missing data for primary outcome will be carried
out to examine the robustness of conclusion. First, the missing data at 8 weeks will
be imputed using the last observation carried forward approach; second, remove the
missing data directly; third, the missing data at 8 weeks will be imputed using multiple
imputation [28]. Assume missing at random, the missing data will be imputed using the Monte Carlo Markov
Chain method for multiple imputation with Proc MI (SAS). Set the initial seed is 1000
and impute five datasets. The missing data for primary outcome will be imputed by
the observation value of age, gender, BMI, KL grade, and duration of disease. Sensitivity
analysis of primary outcome and secondary outcomes will be carried out with PPS to
examine the robustness of conclusion. Several researches have shown that the center stratified randomization lead to the
correlation among treatment groups. Therefore, we will discuss the generalized linear
mixed-effect model for primary outcome to analyze the group effect, in which centre
effects is included. Subgroup analysis based on Kellgren - Lawrence grade will be performed.