Validating an Amharic Version of the 36-item Short Form Health Survey (SF-36) in Individuals With Leprosy in Ethiopia.

Background: Health-related quality of life (HRQoL) has now become an indispensable outcome measure in many randomized clinical trials and other studies. It provides the patient’s voice in measuring health improvement or decline and assessing treatment effectiveness. A validated Amharic version of HRQoL assessment tool was needed for leprosy clinical trials in Ethiopia. The SF-36 was chosen but a validated Amharic version was not available. We describe how this was developed. Methods: The SF-36 was translated from English into Amharic and evaluated for content acceptability in a patient focus group. Back translation was performed. Validity and reliability of Amharic SF-36 in leprosy affected individuals was tested with 100 patients with leprosy attending the leprosy clinic at ALERT hospital and compared to the Amharic version of the WHOQOL-BREF. Results: Amharic translations of both the WHOQOL-BREF and the SF-36 had good reliability and validity amongst leprosy affected individuals. Internal consistency reliability estimates for each domain/scale exceeded 0.70. The Amharic SF-36 had better convergent and discriminant validity than WHOQOL-BREF in this group of patients. Good known-group validity was seen in both WHOQOL-BREF and SF-36 in leprosy affected patients. Amharic SF-36 had good inter-rater reliability with seven out of 8 domains scoring above 0.8 in intra-class correlation. Conclusion: This Amharic version of the SF-36 is a valid instrument to measure HRQoL in clinical trials involving leprosy affected individuals in Ethiopia. to perform physical activities without limitations. Noteworthy ceiling effects were observed for the role-disability scales (24% for role physical (RP) and 22% for role emotional (RE)) in the SF-36, indicating that almost one quarter of individuals affected by leprosy did not feel that their physical health or emotional problems resulted in diculties with work or other activities. A modest ceiling effect was observed for social functioning (SF) with 20% of participants able to perform social activities without interference. both measured similar concepts. GH from showed weak to moderate associations with domains but association with implying that both measured similar concepts, and that participants responding to this question consistently with both questionnaires. The hypothesis that domain/scale scores correlated to self-perceived health status SF-36 in this group of patients. the the weak associations occurred between the social domain of and all of Moderate associations were between the physical domain of WHOQOL-BREF and PF, RP, BP, GH and VT of SF-36 (r range 0.28 and 0.47); and the psychological domain of WHOQOL-BREF and PF, BP, RP, VT and MH of SF-36 (r range 0.32- 0.46). The highest correlations were found between the physical domains of WHOQOL-BREF and RE and MH of the

perceived change in health during the last 12 months. Two summary scores can be calculated: a mental health component summary score (MCS) and a physical health component summary score (PCS).
Four studies reported using the SF-36 in Ethiopia. A study published in the Ethiopian Medical Journal (20) evaluated the SF-36 to obtain normative data in a general health survey in order to establish general population norms and to describe the effects of socio-demographic factors on SF 36. It concluded that the Amharic SF-36 had acceptable psychometric properties and construct validity. The translation system used was not reported. It was later used to assess HRQoL in 271 individuals with schizophrenia (21), and 315 individuals with bipolar disease (22) and 420 people living with HIV and on anti-retroviral therapy (23).
Unfortunately, despite many attempts, we were not able to obtain a copy of this Amharic version of the SF-36, and the developers of SF-36 did not hold an Amharic version.
Although cross validation of item selection and scoring of SF-36 has been done (24), this has often been done on patients living in developed countries with similar standards of living. At rst glance, the face validity of some SF-36 items appear questionable for patients in low-income settings, such as questions about "playing golf", "bowling", "pushing a vacuum cleaner" and "climbing several ights of stairs" in a country where only urban buildings have several oors. This observation pointed to a need to explore the construct validity of the SF-36 before adopting it for use with leprosy patients in our study in Ethiopia.
Validation of a translated questionnaire can be done by comparing its reliability and validity with a validated QOL tool in that language. Previous comparisons between SF-36 and WHOQOL-BREF have been successfully done in patients with HIV, showing that there are good correlations between the corresponding domains/scales of the two instruments (25). Validation of the Amharic SF-36 in our study was done by comparison with an already validated Amharic WHOQOL-BREF (16, 26). Another measure of validity for SF-36 in leprosy patients was to assess known-group validity by comparing SF-36 scores with symptom frequency and symptom severity in leprosy patients. The decision to select SF36 for our clinical trial if it proved superior to WHOQOL BREF was taken.

Methodology Study setting
The study took place at ALERT hospital, a tertiary referral centre for leprosy, in Addis Ababa, Ethiopia.
Study aim: To validate an Amharic version of SF36.
We hypothesized that if both instruments captured the health related QOL of leprosy patients, then: 1. The corresponding domain/scale of both instruments should be positively correlated, i.e. the physical, psychological, and social domains of the WHOQOL-BREF should be signi cantly correlated with PF, MH and SF scales of the SF-36 respectively; 2. The physical and psychological domains of the WHOQOL-BREF should have weak associations with MCS and PCS of the SF-36, respectively; 3. The domain/scale score of both instruments should be positively correlated with self-perceived health status (question 2 in both instruments); 4. The domain/scale score of both instruments should be inversely correlated with the number and intensity of leprosy related symptoms.

Sample size
After review of published literature, a minimum sample size of 30 participants was the usual size in validation studies. We decided on 50 participants for the comparison of SF-36 to WHOQOL -BREF and 50 participants to assess inter-rater reliability, giving a total of 100 participants for the relationship between SF-36 scores and leprosy symptoms.

Ethical approval
Ethical approval was obtained as part of the larger clinical trial from Ethics Committee of the London School of Hygiene and Tropical Medicine (5376), the ALERT and AHRI Ethical Review Committee (AA/ht/248/09), the National Ethics Review Committee of Ethiopia (RDHE/34-90/2009).
Written informed consent was obtained in Amharic. Data was anonymised and stored in a password protected Access database.

Instrument translation and adaptation
The SF-36 questionnaire was translated by two native Amharic speakers uent in English. The translators, two doctors, a social worker and a nurse reviewed the translation to ensure the translation replicated the original as closely as possible but was appropriate for the socio-economic and cultural setting. For example, "pushing a vacuum cleaner, bowling, or playing golf" were removed, leaving only "moving a table" as an example for moderate activity. The two previous reports on the use of Amharic SF-36 mentioned that "climbing stairs" was replaced by "walking up a hill" in their translation, but we felt comfortable using "climbing stairs" in an urban setting, with "walking up a hill" as a second option.
Distance in miles and yards were changed to kilometres and metres which are more commonly used in Ethiopia. Following this, our Amharic version was then discussed in a focus group of two doctors, two nurses, an occupational therapist and seven patients. The patients were of various ages and leprosy experience -two were newly diagnosed patients, three patients had leprosy reactions and two long-term patients had attended for management of neuropathic ulcers. After minor changes in wording, a nal version was agreed and back translated into English by an independent translator. The new English translation was then compared with the original SF-36 for conceptual equivalence and found to be satisfactory by two English language native speakers. The nal Amharic SF-36 version (Appendix 2) was the tested for validity and reliability.
Validity and reliability of Amharic SF-36 in leprosy patients Amharic speaking patients with leprosy were interviewed after obtaining informed consent, by a nurse or social worker trained in questionnaire administration.
Demographic and clinical data were collected using a standard form (Appendix 1). Disability grading was assessed using the Eye Hand Foot (EHF) score recommended by the WHO which has a range of 0-12. The higher the score the greater the disability (27).
Participants were alternatively allocated to one of the two groups. Group A were interviewed by the same interviewer, on the same day with two different questionnaires: Amharic WHOQOL-BREF and Amharic SF-36. Individuals in Group B were interviewed separately by two interviewers, on the same day with the Amharic SF-36. Each interviewer was blinded to the other's interview results.

Statistical analysis
Reverse score items were adjusted in SF-36 for questions SF02, GH02, GH04, VT03, VT04, MH01, MH02, MH04 and in WHOQOL-BREF for questions 3, 4 and 26. The scoring systems recommended by the tool developers were followed for both the WHOQOL-BREF (15) and the SF-36 (19). Data were then analysed in the following aspects:

Baseline Characteristics
The characteristics of the 100 participants are summarized in Table 1  Table 1 Characteristics of 100 patients with leprosy enrolled in this study Characteristics of patient group Total group n = 100 (%) There was a 1:3 ratio of female to males in the group of 100 participants interviewed. Although only 2% had received tertiary education, 58% had been to school and were literate. 27% of participants interviewed were rural residents. Most participants (81%) attending the clinic were being treated for a reaction and were on steroids; only 31% were acutely unwell on the day of the interview. A total of 41% were on MDT.
All the participants were interviewed; none self-completed the questionnaires.

Descriptive statistics for the WHOQOL-BREF vs SF-36 comparison
Each of the 50 Group A participants interviewed had their scores analysed by domains for both questionnaires. The score distribution is shown in The physical and environmental domains of WHOQOL-BREF and six out of the 8 scales of the SF-36 were positively skewed, indicating distributions with more participants scoring lower than the mean group score.
All four domains of the WHOQOL-BREF had trivial oor and ceiling effects. Ceiling effect is measured by the proportion of people getting the highest possible score, whilst oor effects re ect the proportion of people receiving the lowest possible score. The highest ceiling effect was noted in the physical functioning (PF) scale of SF-36 (34%) indicating that one third of participants were able to perform physical activities without limitations. Noteworthy ceiling effects were observed for the role-disability scales (24% for role physical (RP) and 22% for role emotional (RE)) in the SF-36, indicating that almost one quarter of individuals affected by leprosy did not feel that their physical health or emotional problems resulted in di culties with work or other activities. A modest ceiling effect was observed for social functioning (SF) with 20% of participants able to perform social activities without interference.

Convergent and discriminant validity
The correlations for inter-domain/scale of the WHOQOL-BREF and the SF-36 are presented in Table 6     The correlation between the physical and psychological domains of the WHOQOL-BREF and PF and MH of the SF-36 were 0.33 and 0.46 respectively, but the association between the social domain and SF scale was low (r=0.15). The rst hypothesis that the corresponding domain/scale of both instruments should be positively correlated is partially supported.  (19,28,29).
Overall, the results of validity examination showed that SF-36 has better convergent and discriminant validity than WHOQOL-BREF in this group of participants. The social domain of WHOQOL-BREF showed particularly poor correlation, which might be related to the small number of questions in this domain or to poor internal validity of this domain (Cronbach's α= 0.652).
Known group validity Table 7   Inter-rater reliability One way of performing reliability testing is to use the intra-class correlation coe cient (ICC). It can be de ned as, "the proportion of variance of an observation due to between-subject variability in the true scores". The range of the ICC may be between 0.0 and 1.0. The ICC will be high when there is little variation between the scores given to each item by the raters, e.g. if all raters give the same, or similar scores to each of the items.
The ICC is an improvement over Pearson's and Spearman's , as it takes into account of the differences in ratings for individual segments, along with the correlation between raters (30).
In this study intra-class correlation was calculated by using ICC (2), "Two-Way Random" method which works on two assumptions: 1) it models both an effect of rater and of ratee (i.e. two effects) and 2) assumes both are drawn randomly from larger populations (i.e. a random effects model). Mean rating was selected, computing rst the mean of each of the 8 domains of SF-36 (PF, RP, BP, GH, VT, SF, RE, MH), for each of the 50 participants in both sets of interviews. The measure of consistency was chosen as this is recommended when comparing means and results are summarized in Table 8. a. Type C intra-class correlation coe cients using a consistency de nition-the between-measure variance is excluded from the denominator variance.
An intra-class correlation of 0.7 is deemed acceptable, above 0.8 is optimal and a score of above 0.9 would be considered excellent inter-rater reliability. Our results show that for four out of the eight domains of SF-36 inter-rater reliability was excellent, three were in the optimal range and one, social functioning was in the acceptable range. The p-values, all under 0.001, were statistically signi cant.

Discussion
This study assessed the reliability and validity of our Amharic version of the SF-36 by comparing it to the Amharic WHOQOL-BREF in measuring HRQoL in individuals with leprosy.
The Amharic translation of SF-36 was translated following standard procedures (31). Interviewers reported that it was easy to use and that most participants understood the questions.
Descriptive statistics revealed positively skewed score distributions of the WHOQOL-BREF domains and SF-36 scales indicating more patients scored less than the mean group scores. Taking into consideration that 69 % of patients interviewed were attending hospital because they were unwell, this result would be expected, and would be an indication of validity. This was further supported by the high ceiling effect noted in the PF scale of SF-36 (34%) and 24% for RP and 25% for RE, supporting the theory that our patient group would have a large proportion that would have some limitations in physical functioning, and work/social activities. The inter-rater reliability was very good with all the scales scoring between the acceptable and excellent range (32).
At the commencement of our study, there were two published studies assessing quality of life using SF-36 in leprosy in a clinical situation. Both studies were based in Brazil. One assessed the quality of life in 107 patients attending a health facility for leprosy treatment (13) and the second quality of life in 49 patients on treatment for paucibacillary (PB) leprosy (14). The second study found that quality of life scores in 63% of patients with PB leprosy was not affected. Most of these patients were diagnosed early with no leprosy reaction or nerve function impairment.
The study of Lustosa et al. (13) found that patients with reactions, increased disability grades and a perception of stigma had a signi cantly lower score in all scales of SF-36. Recently, more studies with people affected by leprosy are reporting the use of SF-36 to assess HRQoL. In Bangladesh, 29 patients with erythema nodosum leprosum (ENL), a chronic and often severe complication of leprosy, had signi cantly worse HRQoL in all 8 domains of SF-36, compared to 46 leprosy patients without ENL (33). Similar ndings in Brazil, discussed the effect of ENL on the HRQoL of leprosy patients as measured by SF-36 (34). In 2019, a Brazilian study compared SF-36 and DLQI in leprosy patients, nding that SF-36 was the better HRQoL assessment tool covering non-dermatological sequelae of leprosy such as body or nerve pain, and disability (35). The use of SF-36 is reported in African leprosy patients for the rst time here.
The Amharic SF-36 scores in our sample of 100 Ethiopians with leprosy were much lower compared to the Ethiopian normative data (20). The difference was more marked in the scales regarding bodily pain and social functioning. This may be explained by the fact that 81% of patients interviewed were on treatment for leprosy reaction, 31% were acutely unwell on the day of the interview and 46% had severe symptoms. The signi cant relationship between reduced HRQoL and physical pain has been previously described in other studies (10). The lower social scores in the social functioning of individuals with leprosy may be a re ection of the stigma that exists in leprosy. The scores in both emotional and physical role scales were lower in leprosy affected people indicating di culties with work or other activities as a result of physical health and emotional problems.
In our study, strong correlations were found between higher grades of disability (determined by EHF scores) and lower SF-36 scores, in particular in PF, BP, SF, RE, MH, PCS, and MCS. The correlation between higher level of severity of symptoms and lower HRQoL scores was statistically signi cant in all the scales of SF-36. This was also mostly true for the number of symptoms experienced and for patients who were unwell on the day of the interview. These differences in scores in the Amharic SF-36 between patient categories indicate that the questionnaire has good construct validity.
The study is not without limitations. Self-administered questionnaires are preferable. Using interviewers is essential in populations were literacy rates and levels of education are low, but conversations and explanations may have in uenced some of the answers.

Conclusion
Our Amharic version of the SF-36 is valid and reliable. We are con dent that this instrument is useful to measure HRQoL in clinical trials involving Ethiopian participants with leprosy.