Cultural Adaptation of Health Literacy Measures: Translation Validation of the Newest Vital Sign in Arabic-Speaking Parents of Children With Type 1 Diabetes in Kuwait

Purpose The purpose of the study was to assess the feasibility of use and reliability of the Arabic version of the Newest Vital Sign (NVS-Ar) in parents of children with type 1 diabetes (T1D). Methods The final translated version of NVS-Ar was administered to 175 adult caregivers of children with T1D who are native Arabic speakers. The association between NVS-Ar scores for the parents/legal guardians and A1C for their children was assessed. The internal consistency was evaluated by Cronbach’s α, and reliability was assessed by test-retest method. Results The median (interquartile range) score was 4.0 (3-5). The internal consistency of the NVS-Ar was moderate (α = .58). The intraclass correlation coefficient was .61. There was no correlation between NVS-Ar score and A1C (Spearman’s ρ = .055; P = .62). Furthermore, there was significant inverse association between adequate health literacy and optimal glycemic control among the children, which remained evident even after adjusting for the duration of T1D, age, or education of the parents/guardians. However, it lost statistical significance after adjustment for treatment regimen. Conclusion Study findings indicate that the NVS is unlikely to be a predictive tool for functional health literacy in Arabic settings and that there is a need to properly translate and validate other tools such as the Test of Functional Health Literacy in Adults or, alternatively, to develop a reliable tool.


Introduction
Health literacy (HL) is defined as "the cognitive and social skills that determine the motivation and ability of individuals to gain access to, understand and use information in ways that promote and maintain good health." 1,2 It is a complex concept with particular focus on the ability to process and understand both linguistic and numeric information in a health context. Such ability is critical to make appropriate informed health-related decisions. Inadequate HL has been linked to adverse health outcomes independently from education, ethnicity, and socioeconomic status. 3 As an example, inadequate HL was linked to smoking, 4 hospitalization, 5 poor glycemic control in diabetic patients, 3,6,7 feeding formula milk for infants, 8 limited participation in screening for diseases, [9][10][11][12] self-reported poor health, 13 and an increase in all-cause mortality. [14][15][16] In clinical settings, it is envisaged that when there is a gap between patients' HL and their demands of the health care system, those with inadequate HL are particularly disadvantaged. 17 As part of clinical practice, patients are given written and verbal information about their medical conditions, treatment, or preventive procedures and are asked to sign different types of forms and consent to medical procedures. Patients with inadequate HL might have difficulties in understanding complex information and can therefore face difficulties when making informed decisions regarding treatment or other matters related to their medical conditions. Health care professionals need to have knowledge on the HL level in their patient population to provide information at an appropriate level to help patients make fully informed decisions regarding their health care. 18 For example, pooled prevalence data show that nearly 1 in 3 patients with type 2 diabetes (T2D) in the US have limited HL. 19 Unfortunately, health care professionals often fail to identify individual patients with limited HL. [20][21][22] Clearly, patients with limited language skills, who are illiterate or semiliterate, will be disadvantaged in HL testing, and it is thus important to note that most patients are unwilling to admit that they have literacy problems. 5 An appropriate and objective measure of HL is crucial to investigate the impact of inadequate HL on individuals' health and health care use in different cultures. The Test of Functional Health Literacy in Adults (TOFHLA) and the Rapid Estimate of Adult Literacy in Medicine (REALM) are 2 of the most commonly used tools to measure HL. 23,24 However, these tools have some limitations to their usefulness in different settings. For example, the TOFHLA tool takes around 22 minutes to be completed 23 and is therefore not practical for use in a busy clinical setting. Furthermore, many of these tools have not been translated and validated in major languages such as Arabic with distinct dialects spoken in different countries. The short version of the TOFHLA (S-TOFHLA) has been translated to Arabic, but no further steps to validate the tool have been made. 25 The S-TOFHLA and the revised REALM (REALM-R) have been translated to Arabic for use in the Lebanese population only. 26 In 2005, Weiss et al 27 developed a screening instrument, the Newest Vital Sign (NVS), to assess an individual's level of HL. In an average of 3 minutes, the NVS assesses mathematics, reading, and comprehension skills and abstract reasoning based on 6 questions referring to an ice cream nutrition label. This rapid test has been found to be as sensitive in identifying those with inadequate HL as the gold standard measure of HL (ie, TOFHLA). 23 The NVS has been widely used in the US and has been validated for use in the UK. 17 The test has been translated to other languages, for example, Dutch, 28 Turkish, 29 and Japanese, 30 but has been only translated to Arabic with no further validation. 25 In Arab countries, HL has been adopted by the United Nations as an important sustainable developmental goal, 31 yet it remains a neglected topic. It is crucial to have a properly validated tool to assess HL in Arabic-speaking populations. The lack of a valid tool to measure HL in Arabic settings has become a hindrance to address HL and its impact on health status and health care utilization in the Arab world. As a striking example of the dire need for addressing HL in Arab countries, Al-Taiar et al 32 reported that the majority of patients on the waiting list for knee replacement surgery were unaware of the benefit of the procedure or the time the prosthesis would last and attributed the delay in having the procedure to the lack of information. Similarly, more than 60% of patients with T2D showed inadequate or marginal HL using TOFHLA. 33 This study aimed (a) to translate and validate the NVS tool into Arabic and (b) to assess the feasibility of use and reliability of the Arabic version of the NVS in parents of children with type 1 diabetes (T1D).

Phase 1: Translation of the English Version of the NVS Instrument to Arabic
Translation and cross-cultural adaptation of the Arabic version of the NVS (NVS-Ar) was conducted according to established guidelines. 34,35 First phase included forwardtranslation, expert panel review, and backward-translation of the original English version of the NVS instrument ( Figure 1, Panel A, Phase I). The original instrument, including the food label and the questions, was translated into Arabic independently by 2 bilingual translators. Both translators were instructed to compare the 2 separate Arabic translations to produce a single reconciled Arabic version of the instrument (Figure 1, Panel B, Phase II). An expert panel was formulated to review the Arabic version of the NVS instrument from the linguistic and cultural perspectives. They made changes to the food label translation to be in line with the food labeling system used in Kuwait, namely, the translation of serving size, serving per container, and amount per serving. The numeric values were kept in the English language in the food label as per the labeling system in the country. The expert panel consisted of physicians, clinical nurses, dieticians, and local food manufacturers. After the review and modification, the final Arabic version was back-translated to English independently by 2 different bilingual translators who were not familiar with the original English instrument. As for the forward-translation, the translators compared their 2 back-translated versions and produced a single reconciled English version of the instrument.

Phase 2: Translation Validation of the Arabic Version of the NVS-Ar Instrument
To validate the translation from the first phase, the methodology described in international guidelines was followed. 34,36,37 This method involves formal comparison of the original English NVS and the back-translated to English version. Medical students (n = 11) and faculty and supportive staff (n = 12) at the Faculty of Medicine, Kuwait University compared each item in the 2 English versions and ranked them on comparability of language and similarity of interpretation using a 7-point Likerttype scale (comparability of language: 1 = extremely comparable, 6 = not at all comparable; similarity of interpretation: 1 = extremely similar, 6 = not all similar). Medical students and staff at the Faculty of Medicine are fluent in English. Comparability of language refers to the formal similarity of words, phrases, and sentences, whereas similarity of the interpretation refers to the degree to which the 2 versions would engender the same attitude response even if the wording was not the same. Items with mean scores above 3 for comparability and/or similarity were reevaluated. This step is aimed at identifying problematic items and retranslate them until an item is interpreted in the same manner in both languages ( Figure 1, Panel B). Findings from this phase are shown in Table 1. The final version of the Arabic version of the NVS-Ar can be obtained by contacting the first author.

Phase 3: Feasibility and Reliability Testing
Testing was conducted in a group of parents or legal guardians of children with T1D attending hospital-based outpatient diabetes clinics in Kuwait. The final Arabic version of NVS-Ar was administered to 175 native Arabic-speaking parents/guardians. Caregivers who had documented learning disability or self-reported visual abnormalities preventing them from visualizing the food label were excluded from the study. A subgroup of participants was asked to revisit the hospital, and a retest was done 1 to 3 weeks later.

Phase 4: Validity and Functionality Testing
Previous studies have shown that HL assessed in parents related to glycemic control of their children. 38 Therefore, we investigated the functionality of NVS-Ar by assessing the association between the NVS-Ar score of the parent/guardian with their child's A1C level. Optimal glycemic control in this study was defined as A1C <7.5%, per the 2018 International Society for Pediatric and Adolescent Diabetes. 39 The validity of NVS-Ar was also assessed by investigating the association between NVS-Ar score and educational achievement.

Data Analysis
Data were analyzed using Stata 12. (Stata Corporation, College Station, TX, USA). Categorical variables were summarized by proportions and quantitative variables by mean (SD) or median (interquartile range [IQR]). The feasibility of NVS-Ar was assessed by analyzing the number of questions answered and the time it takes the participants to do the test. Internal reliability was assessed by Cronbach's α. External reliability was assessed using a test-retest method using intraclass correlation coefficient (ICC) with a 2-way mixed effects model.
The validation of NVS-Ar was assessed by calculating the association with educational achievement. Furthermore, the functionality of NVS-Ar was investigated by assessing the correlation between NVS-Ar scores for the parents and A1C levels for their children. Logistic regression was used to investigate the association between NVS-Ar score (as a continuous and cate- gorical variable) and the A1C while adjusting for potential confounders.
This study was approved the Health Sciences Center Ethical Committee, Faculty of Medicine, Kuwait University.

Translation Validation of the Arabic Version of the NVS Instrument
To validate the translation, medical students, faculty, and supportive staff at the Faculty of Medicine, Kuwait University systematically compared the original English NVS and the back-translated to English version (Form A and Form D in Figure 1). Table 1 shows the median comparability and similarity scores of each item on the questions/scoring sheet of the NVS instrument. Backward translation produced an identical wording for Question 6 and therefore was not included in the comparability and similarity testing. Two items were considered problematic during this phase and had to be retranslated and reevaluated until the desired mean scores were obtained (Figure 1, Panel B).

Feasibility and Reliability Testing
The final Arabic version of the NVS-Ar instrument was administered to 175 participants. Table 2 shows the sociodemographic characteristics of the parents or legal guardians who participated in the study. The majority of the study participants were Kuwaiti (n = 108, 62.79%) and predominantly mothers (n =132, 75.86%). The mean (SD) age of the index child was 9.20 (2.98) years. The median (IQR) of the duration of T1D was 32.37 (18.97-54.93) months.
The final Arabic version of NVS consists of a nutritional label and 6 questions, with 1 point awarded for each correct answer, giving a minimum score of 0 and a maximum score of 6. The distribution of the NVS-Ar score is shown in Figure 2. The distribution is negatively skewed with alarge number of participants scoring a higher score, which indicates a good level of HL. The median (IQR) score was 4.0 (3)(4)(5). Difficulty of individual questions are shown in Table 3. Only 6 (3.43%) participants scored 0, and the median (IQR) of the time required to do the test was 4.36 (3-7) minutes. The internal consistency of the NVS-Ar was moderate (α = .58), and the ICC as a measure of a test-retest reliability was .61.

Assessing the Validity of NVS-Ar
The validity of NVS-Ar was assessed by investigating the association between NVS-Ar score and educational level. Table 4 shows the association between NVS-Ar score and the educational level. There was no association between NVS-Ar score and education level (P = .423). This remains unchanged after adjusting for age, gender, and nationality. Because, previous studies showed that parents' HL is related to glycemic control of their children, 38 the functionality of NVS-Ar was investigated by assessing the association between NVS-Ar scores for the parents and A1C levels for their children. There was no correlation between NVS-Ar score and A1C (Spearman's ρ = .055; P = .62). Furthermore, there was significant inverse association between adequate HL (score >3) and optimal glycemic control among the children (those with adequate HL tended to have children with higher A1C levels; Figure 3). This remained evident even after adjusting for the duration of T1D and age or education of the parents/legal guardians. However, it lost statistical significance after adjusting for treatment regimen. It is worth noting that educational level showed positive association with proper glycemic control, although it did not reach statistical significance.

Discussion
It is increasingly recognized that inadequate HL is associated with poor health-related knowledge and comprehension and, as a result, adverse health outcomes. 40 Mounting evidence now supports a growing awareness that general HL is a major individual factor affecting an individual's health status. 41 The availability of reliable instruments to measure HL has contributed to the raising awareness on the impact of HL on individuals' health. 23,27,42,43 Most of the instruments were developed and validated in English, and because it is difficult to develop and validate new instruments in other languages de novo, such instruments are translated to other languages to be adopted in different cultures. However, it is a challenge to adapt such instruments in a culturally relevant and comprehensible form while maintaining the meaning and the intent of the original items. 34 This study reports on the translation and validation of the NVS instrument into Arabic using a validation process as described in the international guidelines. 34,36,37 Cronbach's α was .58, which is not high as an index of   internal consistency of a test. However, Cronbach's α is only a reflection of the interrelatedness of the items in the test 44 (if the items in a test are correlated to each other, the value of the α coefficient is increased), and because the fifth and sixth questions are meant to measure comprehension and the first 4 questions (questions 1-4) are meant to measure numerical skills, a low α coefficient is not surprising. Furthermore, Cronbach's α is affected by the length of the test, and if the test is too short, like NVS, the value is reduced. In fact, a high level has never been reported from any study. Cronbach's α was reported to be .69 among 85 participants in Iraq, 25 .70 in Turkey, 29 .74 in UK, 17 .69 in Spain, 27 and .76 for NVS-D in the Netherlands. 28 Although the α coefficient of ≥.70 is deemed to be satisfactory, 45 there is numerous criticism for this approach. 44 A test with α = 0.70 still has huge amount of error that may exceed 50% of the results of the test. This study agrees with previous studies that the fifth question, "Is it safe for you to eat this ice cream?," is particularly problematic. 46 This question requires a dichotomous answer "yes, no," which can be easily answered by guessing. Salgado et al 46 showed that most of those who answered the question correctly failed to report the reason for their answer, which is requested in the next question. In this study, the fifth question was most often answered correctly, but approximately one third of those who answered the fifth question correctly failed to explain why in the next question. In Turkey, the fifth question was the most correctly answered question. 29 As mentioned previously, unlike the first 4 questions that measure numerical skills, the fifth and sixth questions evaluate comprehension, hence inherently are not related to the other 4 questions in NVS. This reduces the α coefficient, hence undermines reliability.
Most of the previous studies did not report test-retest validity (external reliability). The external reliability was assessed using a test-retest method using ICC with 2-way mixed effects model. This was found to be .61, a number that is deemed to be low. Usually, ICC of >.80 is indicative for excellent reliability. 47 No association was found between NVS-Ar score and the educational level of the parents/legal guardians. This finding is in agreement with previous studies that showed no link between NVS score and educational level, 17,25,46 although some studies have reported a link. 29 In the UK, NVS score  was reported to have a weak correlation with educational attainment. 17 Nevertheless, it is known that educational achievement is not a good predictor of literacy skills because many individuals have literacy skills well below what might be expected from their level of education. 17 Previous studies showed that HL of the parents of children with diabetes is related to glycemic control of their children, 38 thus we investigated the functionality of NVS-Ar by assessing the association between NVS-Ar scores for the parents/legal guardians and A1C levels for their children with T1D. The results show no correlation between NVS-Ar score and the A1C of the children. In fact, there was a significant inverse association between HL of the parents/legal guardians as measured by NVS-Ar (score ≥4) and optimal glycemic control in their children. These results are not surprising given that NVS previously was shown to have a limited utility in predicting medication adherence among Portuguese adults, 46 although this was attributed to floor effect (large number of participants scored very low score). In the present study, despite the average high score (median = 4.0; i.e., limited floor effect), the NVS-Ar showed limited predictability of A1C, which cast doubt on the predictive value of the NVS-Ar in our setting. Al-Jumaili et al 25 suggested that NVS might not be applicable as an HL test for people in Iraq, a neighboring country with similar culture and lifestyle, because they are not accustomed to reading product labels in their daily life. Arabic food culture is centuries old and reflects great trading in unique spices, herbs, and foods primarily from specific parts of the world like Africa and India, which to date might not use food labels for such products. In the Netherlands, NVS was found to be problematic, and a new NVS for Dutch people was developed. 28 Furthermore, the limited predictive value of the parents' NVS-Ar scores in our setting might be attributed in part to the nature of the living situation of some families in Kuwait. As in many Eastern cultures, some parents and their children continue to live with their extended large families, where they have partial control on food served for their children.
The NVS is attractive because it is brief and covers both reading and numerical skills. However, the drawbacks of the NVS include low reliability and doubts about its validity. The validity of NVS has been investigated by running a parallel HL test such as TOFLA, which does not really show the functionality of the tool. The study findings suggest that the NVS is too short to provide reliable results and to be a predictive tool for functional HL, at least in an Arabic setting.
HL is an increasingly researched area in health care in the West compared to a severe lack of such research in the Arab world. The current study highlights the need for future research and international comparison of HL in Arabic-speaking populations. It presents a validated and rigorous cross-cultural adaptation process of the tool that addressed potential differences in cultural interpretation of language and its utilization. However, there are some limitations to the present study. First, due to the lack of a "gold standard" Arabic tool to test HL among adults, the process of validating the Arabic version of the NVS is incomplete. Second, to test feasibility and reliability of the Arabic version, parents/legal guardians of children under follow-up at the diabetes clinics were selected. These participants may have different HL skills compared to the general adult population in the country, which might limit the generalizability of our findings. Testing of the Arabic version of the NVS on other adult populations could further validate the feasibility and reliability of the instrument.
In conclusion, the study findings indicate that the NVS is unlikely to be a predictive tool for functional HL in Arabic settings and that there is a need to properly translate and validate other tools such as TOFLA or, alternatively, to develop a reliable tool. Such work is a prerequisite for initiatives that aim to improve HL in Arab countries.