Study design
Pilot, cross-sectional, genetic association study based on the candidate gene approach.
Participants and Setting
A total of 254 participants were conveniently recruited in the year period 2013 – 2018, from the waiting room of the Oncology Clinic of a tertiary public teaching hospital in South Africa. All eligible participants (Table 1) agreeing to participate gave written informed consent. The recruited participants self-identified as ‘mixed-ancestry’ ethnicity, a rich genetic admixture ancestrally derived from immigrants from Western Europe, West Africa, Asia and the indigenous Southern African populations [19].
Study procedures
This study received ethical approval from the Human Research Ethics Committee at the University of Cape Town (HREC REF: 359/2019). Study procedures have been previously reported [7]. Briefly, eligible consented participants completed the Shoulder Pain and Disability Index (SPADI) questionnaire and had their bloods drawn by venipuncture at the cubital fossa of the unaffected side using EDTA vacutainer tubes. Whole blood samples were immediately stored at -20ºC until total DNA extraction using the method descried by Lahiri etal [20]. Extracted DNA was stored long-term at -20ºC. Relevant information for each participant including age, tumor grade, surgery data, and adjuvant therapy data were obtained from participants’ medical records.
Patient reported outcome measure
The primary outcome measure in this study was the SPADI, a validated and reliable patient reported questionnaire with two domains: Pain (5 items) and Disability (8 items) [21, 22]. Participants rated pain or difficulty associated with specific activities of daily living on a visual analog scale (VAS) of 0 (no pain/difficulty) – 10 (extreme pain/difficulty). Symptom scores for both SPADI domains were reported as percentages of possible total scores [22].
Pain and disability scores were categorized according to score effects on activities of daily living and clinical relevance [7]. The reference ‘no – low’ category consisted of participants with SPADI pain/disability scores <30 whereas the case ‘moderate – high’ category consisted of participants with SPADI pain/disability scores ≥30.
Genetic variables
Exposures in this study were the total genotypes obtained from genotyping single nucleotide polymorphisms (SNPs) within four candidate proteoglycan genes: ACAN (rs1126823 G>A, rs1516797 G>T, rs2882676 A>C); BGN (rs1042103 G>A, rs743641 A>T, rs743642 G>T); DCN rs516115 C>T; and VCAN (rs11726 A>G, rs2287926 G>A, rs309559).
Single Nucleotide Polymorphism (SNP) selection
SNPs with global minor allele frequency >0.15 in the ENSEMBL database (http://www.ensembl.org) were selected for investigation based on meeting one or more of the following criteria:
- Identified from a whole exome sequencing project on risk factors for tendinopathy or musculoskeletal soft tissue injuries [23].
- Functional significance, based on reported effects on gene expression or protein function.
- Located in regulatory gene regions.
- Previous associations with multifactorial soft-tissue shoulder conditions.
A total of ten SNPs within four proteoglycan encoding genes were included (Tables 4 and 5). In order to ensure robust genetic association analyses, only SNP call rates of >95% and Hardy-Weinberg p-values >0.05 were included.
Genotype determination
Genotyping was performed using TaqMan™ assays (Applied Biosystems) in 96-well plates, following manufacturer’s instructions in a QuantStudio™ 3 Real-Time PCR System (Applied Biosystems) at the Division of Exercise Science and Sports Medicine, University of Cape Town. Both negative controls (no DNA sample), positive controls (DNA of known genotypes) and replicates (sample duplicates) were included in every plate to evaluate the reliability of the PCR and detect potential genotyping errors. The genotyping data were analyzed on Thermo Fisher Cloud genotyping analysis Software Version: 3.3.0-SR2-build 21 with automatic genotype calling for the 9 SNPs: ACAN (rs1126823 G>A, rs156797 G>T); BGN (rs1042103 G>A, rs743641 A>T, rs743642 G>T), DCN (rs516115 C>T) and VCAN (rs11726 A>G, rs2287926 G>A, rs309559). Due to less efficient amplification for the ACAN rs2882676 A>C SNP, genotypes were manually called and compared with the manual calls of an independent blinded technical support member with 99.7% similarity.
Bias
Nine percent (23, out of 254) of participants could not provide bloods because they were lost after consent when they went for further medical examination in the clinic. Although there may be differences between participants who provided bloods and those who did not, it is unlikely as all participants were randomly identified and consented.
Statistical analysis
The calculation of sample size for this study, using QUANTO version 1.2.469 [24], was described previously [7]. A sample size of N=231 was regarded likely sufficient to detect odds ratios of ≥2.5 for allele frequencies ≥0.15, assuming an expected average baseline risk for shoulder pain (32%) and disability (25%), for dominant or additive genetic models [7].
Demographic and clinical data were analyzed using Statistica version 13.2.70 [25]. Mann Whitney U tests were used to evaluate differences in quantitative characteristics between the shoulder pain/disability categories, given that the data was non-parametric. Fisher’s exact and Chi-square analyses were performed to evaluate differences in categorical demographic and clinical characteristics between the shoulder pain/disability categories.
The genotype data were analyzed using R Studio version 1.3.895 running R version 3.6.3 [26, 27]. Chi-square and Fisher’s exact tests were used to evaluate differences in the genotype, allele and inferred haplotype frequencies between the shoulder pain/disability categories. Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were calculated using R package ‘genetics’ version 1.3.8.1.2 [28]. Logistic regression analyses were performed using R package ‘SNPassoc’ version 1.9-2 to evaluate the association between SNP genotype and shoulder pain/disability category membership [29]; The best model (with the lowest Akaike Information Criterion (AIC)) was chosen among dominant, recessive and log-additive models. Using the R package ‘haplo.stats’ Version 1.7.9 [30], inferred haplotypes for the ACAN, BGN and VCAN polymorphisms were constructed using the genotype date for each SNP investigated. To investigate possible gene-gene interactions in modulating risk for shoulder pain/disability, inferred allele combinations were constructed using the relevant genotype data for the genes. The choice of SNPs for inferred allele combination construction was based on stepwise backward elimination logistic regression analysis. In each step, the least informative SNPs whose exclusion lowered, and therefore improved, the AIC of the model were removed until the last three SNPs representing the best model for shoulder pain or disability with three SNPs. To avoid saturating the models while controlling for confounding, only participants’ age, which was shown to be associated with our primary outcomes, was included in all multivariate regression models. For all inferred haplotypes or allele combinations, a low haplotype frequency cut-off of 4% was used to improve validity. Stepwise regression analyses were performed using R package ‘MASS’ version 7.3-51.5 [31]. R package ‘ggplot2’ version 3.2.1 was used to produce all graphs [32]. The level of significance was set as p<0.05.