Validity and Reliability of the Amharic Version of the HLS-EU-Q47 Survey Questionnaire among Urban School Adolescents and University students in Dire Dawa, Ethiopia

Background: Comprehensive tool is not available to assess health literacy status across different languages, contexts and population structures except health literacy survey scale (HLS-EU-Q47) which is widely used adapted and tested in different countries and languages. However, it was not tested for Ethiopian populations. This study aim was to validate and test the reliability of the Amharic version of the HLS-EU-Q47 survey questionnaire (HLS-Amh) among school adolescents and university students in Dire Dawa, Ethiopia. Method: A cross-sectional study with multistage random sampling was done on urban school adolescents and university students from public schools and Dire Dawa University in Dire Dawa city, Ethiopia, Africa. After translating HLS‐EU‐Q47 into Amharic by translation and back- translation, data was collected using a self-reported questionnaire from samples of 744 participants with 9% non-response rate in October and November, 2018. Confirmatory factor analysis and correlation analysis was done using SPSS and AMOS. Goodness of fit indices, item-scale convergent validity, Pearson correlation coefficient, floor and ceiling effects, Cronbach's alpha and split-half spearman-brown coefficient was computed taking the cut-off values from guidelines and literatures. Ethical issue was contemplated and informed consent was obtained from institutions and participants. Result: Amharic version of HLS- EU-Q47, (HLS-Amh) was reliable but weak for its validity to measure health literacy among urban school adolescents and university students in Dire Dawa, Ethiopia. Goodness-of-fit indices (GFI, AGFI, CFI and IFI) were within range of 0.90-0.80. Although, RMSEA indices were less 0.10, others have made it insufficient to be said as a good model-data fit and was not tolerable for its validity, and the model lacked strength to meet the model-fit indices satisfaction with higher apparent floor/ceiling effects. However, it showed high levels of internal consistency of reliability with relatively higher Cronbach’s alpha coefficient (α=0.910). Conclusions: HLS-Amh was reliable but weak for its validity on these population groups. It can be used for a general survey on awareness and knowledge other than screening substantial and clinical related inquiries. It needs


Background
Health literacy has become a significant concept and one of the major global health priorities to predict factors and mechanisms of health disparities [1]. Essentially, it is a cognitive and social skill, described as the extent to which people know, are motivated and able to acquire, process, comprehend and use health information in health-related decisions [2,3]. It was implicitly evolved from Gandhi's campaign and efforts in India and as well as volunteers in Africa in the promotion of literacy and health [3,4]. Later on, the word and its concept originated in the United States of America in 1974 by Scott K. Simonds in his work on health education as a policy issue in schools for all grades [5]. These days, it is growing and emerging faster and becoming one of the new healthcare and public health priorities [6]. Its conceptual, theoretical, and practical advances and intervention programs have been expanding [7].
Health literacy measurement tools were introduced in United States of America adapting literacy tests to healthcare settings which was focused on the ability to read and understand health care messages and medical instructions provided in texts and written documents comprising labels in form of numbers on prescriptions, medicines and slips for appointment [8]. These were mainly rapid estimate of adult literacy in medicine (REALM) [9], test of functional health literacy in adults (TOFHLA) [10],wide range achievement test (WRAT), health activities literacy scale (HALS) [11], newest vital sign (NVS) [12] and health activities literacy scale (HALS) [11]. These measurement tools and other similar instruments were developed based on functional literacy (reading and writing ability) and expressed qualitatively.
Health literacy is a multidimensional and multifaceted construct concept [13] so that there exists variation in indicators and measurement tools [14], [15]. Taking these in account, the European Health Literacy (HLS-EU) project that took place 2009-2012 supported and funded by the European Commission and the national partners in the HSL-EU Consortium aiming to measure health literacy in Europe, establish networks and to promote health literacy in Europe [16][17][18]. As starting point for measuring health literacy in Europe, the HLS-EU consortium has assumed "all-inclusive" health literacy definition from literatures framing a conceptual model [16] that described as individuals knowledge, motivation and ability to access, process, understand, appraise, and apply health information for making judgments and decisions in daily life about maintaining and improving quality of life through healthcare, disease prevention and health promotion [2,16,17,19,20]. The core version (HLS-EU-Q47) measuring health literacy consisting 47 items is widely used and allied to the definition and conceptual model has been adapted and tested for its reliability and validity in different countries, in different language and cultural settings in general population and specific age groups and almost has found valid and reliable with some detected limitations [2,3,29,[21][22][23][24][25][26][27][28].
However, in these days, there has not been a universally agreed definition of health literacy resulting conceptual and construct discrepancy across languages is fraught with difficulty during translations [3].Furthermore, comprehensive instruments to measure health literacy across different contexts and settings are uncommon [28]. Such similar studies did not also include African countries including the Ethiopian population in general and among school adolescents and university students in particular. Therefore, the purpose of the study was to validate and test the reliability of the factor structure of the Amharic version of the HLS-EU-Q47 survey questionnaire (HLS-Amh) underlying from the data among urban school adolescents and university students in Dire Dawa, Ethiopia.

Methods And Materials Study design and setting
A cross-sectional study was designed and conducted among urban school adolescents from junior and secondary grade levels and university students pursuing first-year courses. These sample participants were taken randomly from public schools and Dire Dawa University in Dire Dawa city, Ethiopia, Africa.
The study area is located at the Eastern part of Ethiopia which is one of the two self-administering city provinces in Ethiopia, approximately 515 km far from the capital, Addis Ababa. from Dire Dawa University across different departments. The mean age of both gender categories was 18.60 ± 3.175) (10-25 years) among which about 309(45.64%) were females and the remaining 368(54.36%) were males.
The HLS-EU-Q47 was tested for its validity and reliability in the Amharic version (HLS-Amh) that Amharic is the official federal working language of Ethiopia and in some other five states including Dire Dawa city.
HLS-EU-Q47 that was developed and validated by European health literacy consortium in European contexts containing 47 items rated on 4-point Likert scales (1 = very difficult, 2 = difficult, 3 = easy, and 4 = very easy) [18,20] was taken. It was translated into Amharic language using the translation and back-translation method by language and subject area professionals taking into account the cultural and societal contexts. The questionnaire was then pre-tested. Data was collected using a selfreported questionnaire from the end of the week of October through the second week of November, 2018 by the principal author and with help of two other oriented-assistants from Dire Dawa University academic faculties in collaboration with school principals and academic program heads.

Statistical Analysis
Data were screened and examined for missing and inappropriate responses with list wise deletion dropping these cases from the analysis including deletion of outliers and checking for univariate and multivariate normality and maximum likelihood (ML) with assumption of multivariate normality (MVN).
Cut off values were taken from guidelines about reporting results of structural equation modelling specifically, confirmatory factor analysis [30][31][32][33][34][35] and it was done along with correlation analysis and descriptive analysis using SPSS version 23 and Analysis of Moment Structures (AMOS version 23) which is an added into SPSS module specially used for structural equation model, path analysis, and confirmatory factor analysis.
Statistical analysis was conducted focusing on testing the validity and reliability. Validity analysis was established conducting confirmatory factor analysis (CFA) for the general and three separate health literacy domains of health care, disease prevention, and health promotion. Items were loaded onto hypothetical factors (finding, understanding, judging, and applying health information). Goodness-of-fit indices were used to test fit of the data to the model most importantly (i) absolute model fit: root mean square error of approximation (RMSEA) and goodness-of-fit index (GFI);(ii) incremental fit: adjusted goodness-of-fit index (AGFI), comparative fit index (CFI), incremental fit index (IFI), and normal fit index (NFI); and (iii) parsimonious fit, or the chi-square goodness-of-fit test (i.e., the chisquare/degrees of freedom ratio [x 2 /df ratio]) with the assumptions that more satisfied indices indicate better construct validity of the questionnaire [30,34]. Item-scale convergent validity was examined using correlation between the item and its theoretical scale [18] which was determined by the Pearson correlation coefficient. When the r-value was between 0.36 and 0.67, it was considered moderately correlated; r values between 0.68 and 1.0 were considered highly correlated [30,34,36].
Reliability analyse was done to test internal consistency with Cronbach's alpha, and values greater than or equal to 0.7 was assumed as indicator for satisfactory reliability [37]. The split-half reliability was also examined [38]. Due to the limited responsiveness of such a large-scale survey, floor or ceiling effects, which refer to a high percentage of participants scoring possibly the lowest score or achieving possibly the highest score, respectively, were concerned [36]. Therefore, minimal cut-offs for significant floor and ceiling effects were recommended, and for the HLS-EU-Q47 scale, a percentage of 15% or more at floor or ceiling was considered a significant effect [36].Statistical analyses were done analysis using SPSS version 23 and Analysis of a Moment Structures (AMOS version 23). The significance level was set at P < 0.05 at 95% confidence interval.

Measurements Of Construct Validity
The confirmatory factor analysis (CFA) was employed to test construct validity. Under this section, goodness-of-fit indices (RMSEA, GFI, AGFI, CFI, NFI, and IFI) were shown for the data in the hypothetical model fit for the general and/or the three domains of health literacy in overall participants and for both school adolescents and university students ( Table 4). The RMSEA index was less 0.10, and other goodness-of-fit indices (GFI, AGFI, CFI and IFI) were within range from 0.90 to 0.80 for most domains for overall participants and for both school adolescents and university students. The NFI score for all domains in both groups were lower than 0.80. In particular, the AGFI was approximately 0.90 for general health literacy (Gen-HL), disease prevention health literacy (DP-HL), health care health literacy (HC-HL) and health promotion health literacy (HP-HL) in both categories, while the NFI index was below 0.80 for these indexes, except for DP-HC and HP-HL in overall participants. As shown in Fig. 1, the item-total correlations ranged from 0.287 to 0.542.

Item-scale Convergent Calidity
Most of these items were shown not to have satisfactory item-scale evidently each of the items had shown to have very weak correlation(rho) each other from − 0.022 to 0.450 with variations of laying with these range from general health literacy to each of the three sub-domains for both categories of participants (Table 4) Floor And Ceiling Effects There was significant floor or ceiling effects, as the percentages of people with the lowest scores or the highest mean and individual item scores of health literacy were greater than 15%. The percentage of scores on the floor ranged from 12.40-31.40% with a mean floor score of 20.30% in general health literacy of both participants altogether. In addition, the percentage of scores at the ceiling ranged from 10.80-33.53% with average ceiling score of 20.88. Similarly, it had shown higher and significant floor and ceiling affects urban school adolescents and university students separately for the general and for the three sub-domains of health literacy (Table 4).

Measurements Of Reliability
The reliability of the Amharic version HLS-EU-Q47 for these urban adolescents and university students was relatively higher Cronbach's alpha coefficient (α = 0.910). The internal consistency (Cronbach's alpha) for the 47 items in University students' participants was higher than 0.90 and it was 0.88 for school adolescents. Most of the sub-scales had high internal consistency with scores with a range of 0.787-0.935, except DP-HL in school adolescents was 0.743. Likewise, the split-half Spearman-Brown coefficients ranged from 0.621 for diseases prevention-related health literacy among school adolescents to 0.88 for general literacy among participant university students and were mostly respectably satisfactory for Gen-HL and three domains (HC-HL, DP-HL, and HP-HL) ( Table 4).

Discussion
The present study was the first to the authors' knowledge investigating the psychometric properties of HLS-EU-Q47 among Ethiopian population groups. The HLS-EU-Q47 does not measure functional health literacy [20]. It does not evolve any objective items because it is a subjective measurement mainly individuals oriented exclusively in mode a pencil and paper testing [23]. Thus, the study took these facets into account and only focus on the constructs of HLS-EU-Q47.
It has shown that the Amharic version of HLS-EU-Q47 (HLS-Amh) was not a valid but reliable tool to measure health literacy among the urban school adolescents and university students in Dire Dawa Ethiopia. The result lacked strength to meet the model-fit indices satisfaction for its validity [30,[32][33][34]39] and had not adequate item-scale convergent validity, with higher apparent floor/ceiling effects although it had high levels of internal consistency reliability.
Regarding construct validity of the instrument, it did not show a good fit of the data to the hypothetical model for the general and/or for the three domains of health literacy in overall participants and/or for both school adolescents and university students. Although, the RMSEA index was less 0.10, the other goodness-of-fit indices (GFI, AGFI, CFI and IFI) were less than 0.90 but greater than 0.80 for most domains for overall participants and for both school adolescents and university students. The NFI score for all domains in both groups were lower than 0.80 which is insufficient to be said as a good model-data fit [34,38]. In particular, the AGFI was approximately 0.90 for general health literacy (Gen-HL), disease prevention health literacy (DP-HL), health care health literacy (HC-HL) and health promotion health literacy (HP-HL) in both categories; while the NFI index was below 0.80 for these indexes, except for DP-HC and HP-HL in overall participants representing a fit that was not tolerable for its validity [30,[32][33][34]39].
The overall results did not support the fitness of the four-factor structure within each of the three domains of the HLS-EU-Q47 of the Amharic version (HLS-Amh) within Ethiopian sampled urban school adolescents and university students ( Table 4). As shown from Table 3 and Fig. 1, the factor loading (items-factor correlations) ranged from 0.287 to 0.542 showed estimated population factor structure, and indicated that each item of the measurement instrument contribution to measure the factor (health literacy) was lower. Several literature and guidelines showed a dataset factor loading scores should be > 0.80) so that each item could contribute to the factor and indicate the dimensions of the factors sufficiently accounted for by that item [40,41]. For example, a factor loading value of 0.30 translates to approximately 10% explanation and a factor loading value of 0.50 explains about 25% of the variance accounted for by the factor so that the factor loading score must be greater than 0.70 for the factor to account for 50% of the variance of a variable [42].
The evaluation of the factorial structure of data from Ethiopians urban school adolescents and university students in Dire Dawa structured as HLS-Amh can be used with both total and subdimensions scores for its reliability. However, the item loadings of these constructs were below 0.5 which showed its weakness and instability. As a rule of thumb, if the item loadings in one factor are below 0.50 factors are considered as unstable [42]. As these items loadings in the scale factors were inadequate, it is concluded that the construct validity of the HLS-Amh was not supported. Further anticipated evidence for construct validity was the statistical significance and high correlations between HLS-Amh and sub-dimension scores. It is anticipated that individuals with better health care literacy could be literate in general health literacy, disease prevention, and health promotion. In this study, Pearson correlation coefficients between Gen-HL and three sub-dimensions scores showed these coefficients with Gen-HL were found as 0.780, 0.836 and 0.807 with the sub-dimensions HC, DP and HP, respectively and higher correlation coefficient between these sub-dimensions (Table 2).

These values were closer to several studies results in European and Asian countries and a study in
Brazil showed similar results [15,19,[25][26][27][28][29]. Regarding item-scale convergent validity, most of these items evidently had shown to have very weak correlation (rho) ranging from − 0.022 to 0.450 with variations within these range for general health literacy and for each of these three subscales in both categories of participants ( Table 3).
The presence of floor and/or ceiling effects are counted when more than 15% of respondents have had the lowest or highest possible score, respectively [36]. Accordingly, there was significant floor or ceiling effects, as the percentages of participants who had the lowest and the highest health literacy mean and individual item scores far higher than 15%, confirming the presence of higher floor/ceiling effects. The percentage of scores on the floor ranged from 12.40-31.40% with a mean floor score of 20.30% in general health literacy of both participants altogether. In addition, the percentage of scores at the ceiling ranged from 10.80-33.53% with average ceiling score of 20.88. Similarly, it had shown greater and significant floor and ceiling effects for urban school adolescents and university students separately for the general and for the three sub-domains of health literacy ( Table 3). The presence of floor and/or ceiling effects imply extreme scores were achieved for items in the lower or upper end of the scale indicating limited content validity that further impact the reliability [28,36]. These results indicated that the responsiveness of HLS-EU-Q47 scale in Amharic version (HLS-Amh) was unsatisfactory and was not able to differentiate individuals with low or high health literacy in surveying the urban school adolescents and university students' health literacy in Dire Dawa, Ethiopia.
The reliability of the Amharic version HLS-EU-Q47 (HL-Amh) to measure health literacy of urban adolescents and university students in Dire Dawa, Ethiopia was relatively higher (Cronbach's alpha coefficient = 0.910). The internal consistency (Cronbach's alpha) for the 47 items in University students participants was greater than 0.90 and it was 0.88 for school adolescents. Most of the subscales had high internal consistency with scores range of 0.787-0.935, except DP-HL in school adolescents that was 0.743 still at the acceptable levels (Cronbach's alpha > 0.70) [37]. In addition, the split-half Spearman-Brown coefficients ranged from 0.621 for diseases prevention-related health literacy among school adolescents to 0.88 which was for general literacy among participant university students. These were mostly respectably satisfactory [36][37][38] for Gen-HL and three domains (HC-HL, DP-HL, and HP-HL) HL) (Table 3). Thus, the instrument was reliable, with high internal consistencies of HC-HL, DP-HL, and HP-HL similar to those identified in the original HLS-EU survey despite its weakness for construct and convergent validity. A reliability coefficient of 0.80 is considered as the lowest acceptable threshold for a well-developed measurement tool [37]. Accordingly, the internal consistency of HLS-Amh indicated with the Cronbach's alpha coefficients was satisfactorily high for both GEN-HL (0.910) and all three subscales (Table 4).
In the original European study, internal consistency reliability (Cronbach's alpha coefficients) were all greater than 0.90 [18]. Similarly, the coefficients that were obtained in several European countries (Austria, Bulgaria, Germany, Greece, Spain, Ireland, the Netherlands, and Poland) were about 0.90 or higher [2,20]. The various adaptation studies of the HLS-EU scale in other countries obtained were quite similar values. For the Japanese version of the scale, Cronbach's alpha was 0.97 for GEN-HL and 0.91, 0.91 and 0.92, respectively, for the three sub-dimensions [24]. In a syndicated study in Turkey in which the HLS-EU-Q47 scale was used, Cronbach's alpha was found to be 0.97 and for the subdimensions HC, DP and HP, they were found to be 0.91, 0.92 and 0.93, respectively [29]. However, the alpha coefficients in our study revealed relatively lower for the three subscales and among the participants from urban school adolescents that HLS-Amh was a relatively less reliable measuring instrument in the study context and population settings. The Amharic version HLS-EU-Q47 is therefore shown to be reliable but, was weaker for its validity so that there is and need to develop another comprehensive health literacy measurement for urban school adolescents and university students as well as Ethiopian population.

Conclusions
The Amharic version of HLS-EU-Q47 for urban school adolescents and university students in Dire Dawa, Ethiopia was reliable but weak for its validity. To this end, this tool can be used as a broader and general survey tool on awareness and knowledge other than being a tool for screening substantial and clinical related inquiries. Using the tool and the sampling method would provide comparable importance to identify associated factors with health literacy.
The study result has shown the need for further adaptation and validation for comprehensive tool in Ethiopian demographic, multi-lingual and cultural contexts. Researchers who want to survey health literacy using HLS-EU-Q47 in Ethiopian population should reconsider the validity of this tool for its relevance to cultural competence and multilingual state of study participants in which local dialects, plain language and combined aspects of two or more languages. Moreover, the language competency