Study Design
This is an observational study using a cross-sectional design with repeated measurement undertaken.
Participants
All participants have genetically confirmed DMD and undergone regular follow-ups under our Paediatric Neuromuscular Disorder Program and Pulmonary Rehabilitation program in our hospital. The study period was from 2016 to 2019. The inclusion criteria for patient recruitment included Chinese, genetically and clinically compatible DMD, aged 4 or above, and able to understand and follow instructions. Exclusion criteria included recent orthopedic interventions for the upper limbs, poor eyesight despite corrective lens, and recent acute medical conditions affecting their general status. Ethical approval was obtained from the Institutional Review Board of the University of Hong Kong/Hospital Authority Hong Kong West Cluster. Written informed consent was obtained from patients aged 18 or above, or from parent or guardian for patients aged under 18.
Sample Size Calculation
All sample size estimations were performed based on an alpha level of 0.05, a power of 0.8. For establishing intra-rater and inter-rater reliability, the participants were evaluated twice. For a high test-retest and inter-rater reliability (ICC= or > 0.8) for the test, a minimum sample size of 25 individual assessment would be required to detect significant changes. Since DMD progresses from the ambulatory to non-ambulatory stage from about age 10 onwards, the ratio of ambulatory vs non-ambulatory participants was adjusted with a ratio of 1:2.
Qualification and Training of Assessors
All raters (rater A, B and C) were registered physiotherapists with over 10 years of clinical experience in paediatric patients. All were trained in the administration of the PUL for DMD through studying the instruction manual. Prior to data collection, all raters were engaged in a pilot trial, in which 2 individuals with DMD were assessed. The ratings were compared, and any discrepancies in ratings were thoroughly discussed until consensus was reached.
Measurements
For the reliability study, twenty-three patients participated the study with two DMD individuals had assessments at both the ambulatory phase and later the non-ambulatory phase. There was a total of 25 individual PUL for DMD assessment. Information on demographic data, physical and neurological examination, and steroid usage, was collected from systemic review of medical records. To study the reliability, each patient underwent the PUL for DMD assessment conducted by one of the three registered physiotherapists with the entire individual assessment process recorded. The other two raters who did not perform the assessment watched the video of the patient PUL assessment and gave the scoring (inter-rater reliability). One month later, the three raters who were blinded about the participants data gave the scoring individually again (intra-rater reliability).
For the construct validity study, thirty-three DMD patients participated, with different number of DMD patients participated in each of the correlation analysis. The validation process was conducted by comparing the PUL for DMD with the age, the forced vital capacity (FVC % predicted value) and the Hammersmith Functional Motor Scale that were performed on the same day of the PUL assessment. The serial PUL for DMD scores changes with age changes were also analyzed.
The PUL for DMD (Version 1.3) was used. The test consists one entry item, the high-level shoulder dimension with 4 items, the mid-level elbow dimension with 9 items and the distal-level wrist and hand dimension with 8 items. The item score range is different for individual item ranging from 0 to 6. Total score ranges from 0 to 74. Serial measurement of the PUL at different ages for the 23 DMD patients during their subsequent clinical follow-up were also included for analysis.
Forced vital capacity (FVC% predicted value) which measures the total amount of air a person can exhale during a forced breath, assesses the cough effort. As part of the pulmonary function test, FVC% was obtained by spirometry (COSMED, Pony FX, Rome, Italy.) with the subject in sitting position and the data was taken on the same day of the PUL for DMD assessment.
Hammersmith Functional Motor Scale [20]is a validated test to evaluate the child’s ability to perform various motor activities. The performance was rated by the registered physiotherapist who had performed the PUL scale for the DMD patient on the same day. The scale consists of 20 items assessing rolling, sitting, standing, stairs, static and dynamic standing balance. Each item score ranges from 0 to 2. Total score ranges from 0 when all activities failed to 40 when all activities completed.
Data Analysis
IBM SPSS for Windows (version 20, IBM Corporation, Armonk, NY, USA) was used for statistical analysis unless indicated otherwise. The level of significance was set at P≤0.05 to reduce the probability of making a type I error due to the many variables involved. Demographic data was analysed by descriptive statistics (e.g., means and standard deviations).
Floor and Ceiling Effects
The proportion of individuals with the lowest and the highest possible scores for each test was examined. Floor or ceiling effects were considered to be substantial if the proportion was greater than 20%. The 20% cut off is commonly used in to define substantial ceiling effect in previous studies that assessed pyschomtric properties of different measurement tools for various patient populations. The coefficient of skewness for the distribution of balance scores was assessed by using Medcalc [21-23] (version 16.2, MedCalc Software bvba, Ostend, Belgium). A positive skewness value γ1>1.0 or a negative skewness value γ1<-1.0 indicates substantial skewness, and may indicate a floor and ceiling effect respectively.
Item Difficulty and Item Discrimination Index
The item difficulty index and item discrimination index were used to evaluate the effectiveness of individual test items. As the item scores of PUL for DMD are polytomous, item difficulty was expressed as the item mean of the point scale. A good item will have an item mean close to half of the maximum. The minimum and maximum item mean bounds represent what was considered as the cutoff point for the item mean score being too low (i.e., difficult item) and too high (i.e., easy item) respectively [24]. A factor of 0.3 and 0.6 was used to compute the minimum and maximum item mean bound respectively. The number of categories for the items and whether the item responses begin at 0 must be considered when setting the minimum/maximum item mean bounds. For an item rated on a 4-point scale (0-3), a mean score ≤0.9 (3×0.3) based on a maximum score of 3 would be considered to be a difficult item and a mean score of ≥1.8 (3×0.6) was considered as an easy item. Items with a mean score between 0.9 and 1.8 are considered to have acceptable difficulty level. For an item rated on a 3-point scale (0, 1, 2), a mean score of ≤0.6 (2×0.3) was considered as a difficult item and a mean score of ≥1.2 (2×0.6) as an easy item. For an item rated on a 5-point scale (0-4), a difficult item has a mean score ≤1.2 (4×0.3) whereas an easy item should have a mean score ≥2.4 (4×0.6).
The item discrimination index was expressed as the correlation between the item and the total score (Pearson’s product-moment correlation), which could range from 0 to 1. An item discrimination index value <0.4 indicated that the item was ineffective and may require further examination to determine if it can be edited in some way before discarding [25].
Reliability Analysis
The internal consistency of PUL for DMD was assessed using Cronbach’s alpha (>0.8: excellent, 0.7-0.8: moderate, <0.7: poor) with subscale analysis. Intraclass correlation coefficients (ICC) were used to estimate the intra-rater (ICC3,1) and inter-rater (ICC2,1) reliability (poor: <0.40, adequate: 0.40≤ICC≤0.75, excellent:>0.75) [26]. Using the intra-rater reliability results, the standard error of measurement (SEM) value was derived from the formula: SEM = Sx√(1-rxx), where Sx was the standard deviation of the test total score and rxx was the reliability coefficient. The minimal detectable changes at the 95% confidence interval (MDC95) were computed using the formula: MDC95= 1.96×SEM ×√2 [27].
Validity Analysis
Known-groups validity was established by assessing whether PUL for DMD could significantly differentiate between ambulatory and non-ambulatory subjects. A Receiving Operating Characteristic curve (ROC) analysis was carried out to assess the ability of the test to classify the above patient groups, generating the area under the curve (AUC) values (outstanding discrimination: AUC ≥0.9; excellent discrimination: AUC=0.8-0.9; acceptable discrimination: AUC=0.7-0.8) [28]. The positive and negative likelihood ratios (+LR, -LR) were also determined. A LR value of 1.0 indicates the test is useless in discrimination. +LR > 2 and –LR < 0.5 can be considered as clinically important [27].
Construct validity was assessed by comparing the PUL for DMD with the forced vital capacity, and the Hammersmith motor scale and Pearson correlation coefficient was used to assess the relationship. A value of 0.2 would represent a very weak or no relationship, 0.2 to 0.4 would represent a weak relationship, 0.4-0.6 would represent a moderate relationship, 0.6-0.8 would represent a strong relationship and 0.8 to 1.0 would represent a very strong relationship [27].