With the use of health surveys rapidly extending from the realm of research to clinical, academic and commercial settings, the demand for valid and reliable measures is increasing. The comparability of health survey data across varying populations is also vital and has been challenging in part due to the lack of standardization of health instruments and variation in survey methodology5. As the Arabic language includes varying dialects and culturally specific idioms10, this adds to the challenge of producing standardized and validated instruments in the Arabic language. Instruments may also perform differently in varying contexts, age groups and health conditions and thus require validation in different populations 2, 3, 13.
There is a paucity of research on the quantity and quality of Arabic health measures. Only two systematic reviews were identified that evaluated existing Arabic health measures. A systematic review on Arabic generic health related quality of life measures by Al Sayah (2012) reported on 20 studies which included 6 measures and found moderate to good quality cross-cultural adaptations, however evaluation of measurement properties was limited due to deficient evidence14. Fasfous et al (2017)15 conducted a review to evaluate the quality of studies involving the use of neuropsychological assessments of Arabic speaking subjects. He reported on 384 studies applying 117 instruments and found that nearly half of the publications did not use cognitive tests that were “developed, translated, adapted, or standardized according to international guidelines of psychological measurement”15. Reviews conducted on English health measures of varying constructs have reported similar flaws in methodological quality13, 16–18 Furthermore, after excluding intelligence and cognitive screens, Fasfous et al (2017) found that the three most frequently used tests, the Trail Making Test, Wechsler Memory Scale and Wisconsin Card Sorting had no reporting on their validity for Arab individuals15. Fasfous et al (2017)15 also reported that while 57 tools referenced norming efforts, they were sometimes inaccessible. Similarly, our review found less than 10% of the health measures identified in our search were available directly on-line (Table 1), imposing an additional obstacle to access to existing Arabic health measures.
Our review found Saudi Arabia and Lebanon to have the highest rate of publications reporting on development of measures (Figure 3). Interestingly however, while Fasfous et al (2017) found these same countries had the greatest number of publications on the use of neuropsychological measures, they were ranked the lowest in reporting the validation and/ or norming of instruments15. Our results had a high percentage of publications reporting on validity and reliability (Figure 4), however as this review does not include a quality check of the methodology, it is not clear how many Arabic health measures in the AHM database meet international guidelines for psychometric testing and norming.
Similar to previous reviews14, 15, we found the number of publications on Arabic measures to be limited in comparison to those in English6–9. This is likely due to several factors including the limitation of our search to major English databases and literature published in the English language. It is also likely that more studies may be found in local journals and non-indexed or non-peer reviewed journals. In addition, some measures are translated as part of a research project and may not have publications on their methodology or their publications provide limited information on their methodology thus excluding them from our review. Finally, some translation or validation studies may never become published or are part of gray literature from academic institutions or working groups. Nonetheless, it is encouraging that the number of publications appear to be steadily increasing over the past decade (Figure 1)15.
This review was limited by the inclusion of only major English databases. Lack of a search in local databases was thought to be challenging as it would include over 25 countries, some of which have limited internet access to their journals. We also believe that not all relevant studies may have been identified as the translation or psychometric testing of measures may be part of a larger project and thus not identified by the keywords used.