Describing progression of health literacy skills: Establishing the HLS-Q12 cut-off scores

Background The self-reported European Health Literacy Survey Questionnaire (HLS-EU-Q47) is a widely used health literacy measure. Based on con�rmatory factor analyses and Rasch-modelling, the short form HLS-Q12 was developed to meet the Rasch unidimensional measurement model expectations. After its publication, there has been a worldwide call to identify HLS-Q12 cut-off scores and establish clearly delineated standards in terms of the skills assessed. This study therefore aims to identify the HLS-Q12 scores associated with statistically distinct levels of pro�ciency and construct a pro�ciency scale that expresses progression – what individuals typically know and can do at increasingly more sophisticated levels of health literacy. Methods We applied the unidimensional Rasch measurement model for polytomous items to responses from 900 randomly sampled individuals and 388 individuals with type 2 diabetes. Using Rasch-based item calibration, we constructed a pro�ciency scale by locating the ordered item thresholds along the scale; by applying Wright’s method for the maximum number of strata, we determined the cut-off scores for signi�cantly different levels. By directly referring to item content that people who reached the cut-off scores view as ‘easy’, we could describe these gradually more advanced levels of health literacy. Results We identi�ed statistically distinct levels of health literacy at the empirically identi�ed cut-off scores 27, 33 and 39 and con�rmed them by analysing the responses from individuals with diabetes. Individuals who reach these cumulative benchmarks of marginal, intermediate and advanced literacy can typically access, appraise and apply information relevant to stay healthy, improve health and critically judge health claims and compare treatments, respectively.


Background
Research recognises limited health literacy (HL) as having major consequences for health outcomes [1][2][3].In order to improve quality in the health services, reduce inequalities in health and promote factors that enhance health, the Norwegian Ministry of Health and Care Services recently published a national strategy for HL [4].To support the strategy and provide knowledge-based policy and practice on HL, the Ministry commissioned the Norwegian Directorate of Health to map HL in the population and selected subgroups.Consequently, the Directorate appointed representatives to join The Action Network on Measuring Population and Organizational Health Literacy (M-POHL) within the European Health Information Initiative (EHII) of WHO-Europe.The M-POHL network promotes standards and guidelines for European co-operation in the eld of HL; functions as a communication platform for sharing and disseminating information and expertise among its members as well as stakeholders and policy-makers and facilitates comparative HL measurement.
The M-POHL network's internationally coordinated HL surveys are angling to become Europe's premier yardstick for evaluating the e ciency of health policies.By identifying the characteristics of health literate populations, M-POHL allows governments to identify effective knowledge-based policies and compare good practices that may address health disparities and reduce inequalities in HL, regionally, nationally and internationally.M-POHL is rmly committed to the continuing development of their comprehensive and widely used selfreported European Health Literacy Survey questionnaire (HLS-EU-Q47).The questionnaire was applied in a population study in eight European countries in 2011 [1] and later in Asian countries [5] and further European countries, such as Norway in 2014-2015 [6,7].An accumulated body of corroborating evidence shows that the HLS-EU-Q47 data is not unidimensional [5][6][7][8][9][10][11], which gave rise to a controversy that led to the publication of the HLS-Q12 short form [7]. Subsequent to its publication, there has been an international request to establish the HLS-Q12 cut-off scores for different HL levels.
To assure justi ed estimates of the prevalence of marginal, intermediate and advanced HL as measured by HLS-Q12, we need to identify the HLS-Q12 cut-off scores associated with statistically distinct levels of pro ciency.We will do so by applying Wright's sample-independent method for the maximum number of statistically distinct strata [12] to our data collected in 2014.To assure quality and ensure utility of results, we will evaluate the empirically derived cut-off scores by repeating the analysis in the sample drawn from a population with chronic disease (type 2 diabetes) in 2015.
To describe what people who reach the different cut-off scores typically know and can do, we need to construct a pro ciency scale by directly referring to item content, that is, we need to construct what is called a Wright map based on ordered item thresholds or steps [13].This pro ciency scale will reveal progression in HL-what is means to move from one level to the next in terms of HLS-Q12 item content.
The objective of this paper is therefore to report how we identi ed the HLS-Q12 cut-off scores and in what way we developed the HLS-Q12 pro ciency scale of increasingly more sophisticated and cumulative levels of HL.The results of these two distinct but related processes will answer the following two research questions: What are the HLS-Q12 cut-off scores for statistically distinct levels of health literacy?What do individuals who reach the HLS-Q12 cut-off scores typically know and can do?

Sampling
In November 2014, responses from 900 randomly sampled individuals aged 16 and above were collected in Norway using computer-assisted telephonic interviews (CATI).The international survey agency Ipsos collected the data according to the standardized interview protocol for the HLS-EU-Q47 [8].In March 2015, responses from 388 individuals with type 2 diabetes aged 18 and above were collected in Norway using a paper-and-pencil version of the 47-item HLS-EU-Q47.The sample was drawn from the member list of the Norwegian Diabetes Association, who offered a geographically strati ed sample that included the members' places of residence.In this paper, we base our analyses on the 12-item short form HLS-Q12 [6,7].Finbråten et al. [6,7] thoroghly describe the sampling procedures and sample characteristics.

De nitions and conceptual models of health literacy
De nitions of HL address the lifelong learning perspective-the development of individual skills to access, appraise and apply health information essential to make appropriate health decisions [14][15][16][17][18]. Broader de nitions consider HL as an interaction between the pro ciency of individuals and the demands of health services and systems [14,19,20], such as knowing how to navigate health systems and between health services.The interaction aspect of HL implicitly assumes culturally sensitive health communication between patients, health educators and health providers.
Conceptual models of population or 'public' HL [21] re ect an analysis of aspects of the HL construct-a process that reveals the complexity of the construct.Conceptual models of HL also consider factors and covariates that predict an individual's HL and view HL itself as a critical determinant of health outcomes.However, these latter aspects are not essential to the measurement of the latent trait of HL.
Basic skills or 'fundamental literacy' [19], such as oral skills [14,22], reading and numeracy [14,[23][24][25], writing [14] and digital skills [26], are viewed as key aspects of HL.Other key aspects are media literacy [27], the ability to apply health information to make knowledge-based decisions [23] and the ability to interact with [28] and navigate within the health system [22,29].In their systematic review, Sørensen et al. [18] pointed out that HL has a wide range of aspects and that there is a large variation between the conceptual models of HL.They conclude that the Nutbeam's model [30], which refers to the three categories-functional, interactive and critical HL, summarizes many of the key aspects of HL.Their claim is supported by the fact that other conceptual models take on a similar approach, like Kickbusch and Maag [31] and Manganello [27].Based on the literature review, Sørensen et al. [18] suggested the 'integrated conceptual model of health literacy', which may be narrowed down to three health domains (HD), health care (HC), disease prevention (DP) and health promotion (HP), and four cognitive domains (CD), access (A), understand (B), appraise (C) and apply (D) health information.However, few de nitions and conceptual models of HL identify stages of HL and organize a continuous progression of HL.This is a major challenge to policy-makers who aim to respond to a current major societal challenge-namely, how to bridge health literacy gaps and promote equity by improving health education and simplifying health services.Our study therefore aims to respond to this challenge.
The unidimensional Rasch measurement model for polytomously scored items Model requirements and properties: The concept of measurement implies a linear continuum of some kind, such as height and mass.Psychological performances result from measures that have neither an absolute zero point nor equal units of measurement.In fact, we are hampered by the fact there exists no unit by which to measure.Rasch measurement models [32], which satisfy axioms of conjoint measurement [33][34][35][36][37], construct measures with equal-interval properties, an arbitrary zero point and 'logit' as unit from ordered data, such as rating-scales [38].Conceptually, Rasch modelling involves selecting a measurement model, applying it to the data and examining the data-model t.Rasch models require local independence [39], monotonicity [40] and homogeneity or parallel item characteristic curves [40,41], and they have the properties of su ciency [42], additivity [36,43], separability [44] and speci c objectivity or invariance [45,46].Unidimensional Rasch models: The unidimensional Rasch model [32] for dichotomous items log (p/((1p))) = β n -δ i models the probability p of a correct response as the distance between the pro ciency β n of individuals and the item di culty or item threshold δ i .The partial credit model (PCM; [47]) is a straightforward generalisation of the unidimensional Rasch model applied to polytomous data, such as pairs of increasing adjacent rating scale response categories.
Item thresholds and pro ciency: Applying the PCM to the HLS-Q12 four-point rating scale, with the categories very di cult (1), di cult (2), easy (3) and very easy (4), each item i is assigned three j 'uncentralised item thresholds', δ ij , and each individual n is assigned a pro ciency estimate β n -the most likely point estimate based on their HLS-Q12 score.A standard error (SE) quanti es the precision of this estimate, where low SE means that the random error is small and that the person measure is more reliable.SE is a reciprocal of the Fisher information, where high Fisher information indicates that the maximum is sharp, i.e., there are few nearby values with a similar log-likelihood.
Overall data-model t: Comparing observed proportions to model expected probabilities, the RUMM2030 statistical software package [48] assesses the overall data-model t by the summary t chi-square (χ 2 ) statistic.For large sample sizes, the root mean square error of approximation (RMSEA = √((χ2/df)-1N-1), expected value = 0.0 for perfect data-model t) might be calculated as a supplementary statistic to determine data-model t [49].Applying the PCM, the additional assumption of ordered thresholds is vital.
Variance, reliability and strata RUMM reports a pro ciency estimate with associated SE for each observed HLS-Q12 score, ranging from 12 to 48 points, the standard deviation (sd) of the distribution of these person pro ciency estimates, their average measurement error variance or 'root mean square error' (RMSE2 = 1N∑i = 1NSEi2), their estimated 'true' population variance (SD 2 = sd 2 -RMSE 2 ) and reliability (Cronbach's α = SD 2 /sd 2 ) when there is complete data.Applying Wright's sample-independent method for the maximum number of statistically distinct strata [12], we estimate the associated Wright strata reliability = Strata 2 /(1+ Strata 2 ).
The HLS-Q12 scale The three health domains and the four cognitive domains of the 'integrated conceptual model of health literacy' [18] forms a '3x4 matrix', which constitutes a 12-cell matrix.This matrix or scaffold supported the HLS-EU-Q47 item development (Table 1).
Applying con rmatory factor analyses and Rasch modelling, Finbråten et al. [6,7] developed the 12-item HLS-Q12 scale from the 47-item HLS-EU-Q47 by deleting items that displayed differential item functioning (DIF; [50]) for available person factors (gender, age and education level), items that displayed unordered response categories or reversed thresholds [51] and redundant items.Redundant items were identi ed as the qualitatively least relevant items from pairs of dependent items (response dependency; [52]).If more than one item was still left in a 'cell' in the 3x4 matrix, the least relevant item(s) were deleted based on qualitative evaluation of item content.The purpose was to obtain a conceptually balanced scale [53], consisting of 12 items in line with the '3x4 matrix', which met the Rasch model expectations.Finally, the PCA/t-test protocol [54,55] was applied to test whether the HLS-Q12 scale was su ciently unidimensional.Applying this method, a principal component analysis (PCA) of Rasch residuals was used to identify two subsets of the items; each person was assigned a pro ciency estimate on each subset and the two estimates were compared using dependent t-tests to account for the dependence between the two measures.The HLS-Q12 scale was interpreted as unidimensional, as approximately only 5% of these t-tests were signi cant.The process of establishing the HLS-Q12 and the psychometric properties of the HLS-Q12 items are thoroughly described in Finbråten et al. [6,7].

<PLEASE INSERT TABLE 1 HERE>
Standard setting-the process of establishing benchmark cut-off scores Essentially, standard setting refers to the process of establishing cut-off scores on a test.Several methods exist for the same, such as the Nedelsky, Angoff and the Bookmark method [56].These methods are time and resource consuming owing to requiring the selection and training of experts to make judgements about individual items, suffering from possible judgmental biases, failing to yield results that agree with one another (poor validity) and failing to yield the same results on repeated applications (poor reliability) [56].Further, they do not directly apply to rating-scale items typical for the HLS-Q12 questionnaire.
Wright's sample-independent method for the maximum number of statistically distinct strata [12] identi es signi cantly distinct cut-off scores based on Rasch modelling.The Wright method therefore applies to rating-scale item formats with integer scoring models and is e cient in terms of costs and time commitments.However, the cut-off scores are 'found' by estimation; they are not 'set' by experts.
Since the purpose of the HLS-Q12 pro ciency scale is to express pro ciency not relative to other respondents (norm-referenced) but in terms of HL skills with reference to HLS-Q12 item content (criterionreferenced) at three or more pro ciency levels (low, medium and high), the Wright method is applicable.Three or more pro ciency levels will serve diverse purposes, such as formative patient education, mapping of patients' HL in the health services, controlling information for accountability and population health studies.In addition, the HLS-Q12 continuous person pro ciency estimates can become the data for subsequent statistical analysis, such as regression analysis, where HL is the dependent variable.
Once we have calibrated the HLS-Q12 items, the person pro ciency estimate and standard error corresponding to every possible HLS-Q12 sum score is computed.Having the standard errors, we can easily compute the number of statistically distinct levels of performance by using Wright's method [12].By starting at the lowest pro ciency estimate and advancing by twice the pooled standard error, we can identify the lowest score of the next pro ciency level by locating the lowest sum score that is outside the con dence interval upper bound.We repeat the procedure until there is no room for another level.These score sums de ne 'benchmark cut-off scores' for statistically distinct pro ciency levels.

Developing a pro ciency scale-item mapping for criterionreferenced literacy levels
Using dichotomously scored items, like multiple choice, we would map the item on the continuum at the place where the item information is maximized, i.e., at the item location.Individuals with pro ciency estimates matching the item location have a response probability value of.50.Thus, an item is mapped at the continuum where we would expect that respondents have the skills underlying the item.
For polytomously scored items like constructed response items, Huynh [57] argues that it may be more informative to emphasis the location of each separate response, i.e., the item thresholds.Transferring this idea to the HLS-Q12 rating scale items, the second item threshold, which locates the standing on the latent trait where the typical response is 'easy', is the location where we would expect respondents to typically have the skills underlying the item.Using item mapping, we can then anchor the second uncentralised HLS-Q12 item thresholds along the measurement scale and describe the Wright cut-off scores in terms of HLS-Q12 item content.

Results
Identifying the HLS-Q12 cut-off scores for statistically distinct levels of pro ciency The HLS-Q12 displayed good overall data-model t (RMSEA ≈ 0.0) and su cient reliability (Wright strata reliability = 0.94) for both samples (see overall statistics in Table 2).
We revised and con rmed the empirically identi ed cut-off scores 27, 33 and 39 among individuals with diabetes, where the lowest observed HLS-Q12 sum score was 21 points (β = -2.32,SE = 0.53).As the sum scores 44 and 45 points were not observed in the diabetes sample, we could not strictly con rm the upper cut-off score at 46 points.

<PLEASE INSERT TABLE 2 HERE>
Describing the HLS-Q12 cut-off scores in terms of HLS-Q12 item content The endorsability of the response category 'easy'-the di culty of the second item threshold within items q23, q32, q43 and q14-were located below benchmark 1 (δ i3 = -0.76equivalent to sum score 27 points).Skills auditing implies that individuals who reach benchmark 1 typically view it as 'easy' to 'understand why they need health screenings' (q23), ' nd information on healthy activities such as exercise, healthy food and nutrition' (q32), 'judge which everyday behaviour is related to health' (q43) and 'follow the instructions on medication' (q14).The following describes typical perceived skills at level 1 which we, based on interpretation of content for items q23, q32, q43 and q14, labelled marginal HL: Individuals with HLS-Q12 score 27 or above can typically access, appraise and apply health information relevant to stay healthy.
Likewise, the di culty of the second score step within items q2, q7, q44, q18, q30 and q38 were located below benchmark 2 (δ i3 = 0.74 equivalent to sum score 33 points) but above benchmark 1. Skills auditing implies that individuals who reach benchmark 2 typically can ' nd information on treatments of illnesses that concern them' (q2), 'understand what to do in a medical emergency' (q7), 'make decisions to improve their health' (q44), ' nd information on how to manage mental health problems like stress or depression' (q18), 'decide how they can protect themselves from illness based on advice from family and friends' (q30) and 'understand information on food packaging' (q38).The following generalized interpretation of item content describes typical skills at level 2, which we labelled intermediate HL: Individuals with HLS-Q12 score 33 or above can typically access, appraise and apply health information and advice relevant to enhance physical and mental health.
Individuals who reach benchmark 3 typically master the knowledge and skills associated with items q10 and q28-the HLS-Q12 items that are hardest to endorse.It follows that the second score step within each of these items is located below benchmark 3 (β n = 2.19 equivalent to sum score 39 points).This means that individuals who reach level 3 typically can 'judge if the information on health risks in the media is reliable' (q28) and 'judge the advantages and disadvantages of different treatment options' (q10).The ability to choose between treatments might indicate that individuals at level 3 can identify suitable and optimal care, i.e., navigate in the health care system and between health care services [29].The following describes typical perceived skills at level 3, which we labelled advanced HL: Individuals with HLS-Q12 scores 39 or above can typically access, appraise and apply health information and advice relevant to make informed healthcare choices by critically judging health claims and fairly comparing treatments.
It is vital to recognize that the HLS-Q12 pro ciency scale a) is constructed such that individuals even at the bottom of a pro ciency level typically master the knowledge and skills associated with that level, b) describes progression in HL-what it means to move from one pro ciency level to the next and c) is cumulative as individuals scoring at the advanced level (sum score ≥ 39) also know and can do what is associated with performances at the intermediate level (33 ≤ sum score ≤ 38) and marginal level (27 ≤ sum score ≤ 32).Individuals who perform below the marginal level (12 ≤ sum score ≤ 26) may, for example, understand why they need health screenings (q23) like screenings for diabetic retinopathy if they have type 2 diabetes, but they do not possess a repertory of knowledge and skills needed to reduce the risk of injury, stay healthy and prevent and manage illness.We may refer to individuals performing at or below the marginal level as displaying limited HL (12 ≤ sum score ≤ 32).The HLS-Q12 performance levels 'below marginal', marginal, intermediate and advanced may resemble the HLS-EU-Q47 levels considered as insu cient, problematic, su cient and excellent HL, respectively.

Estimating the proportion who reach each level of pro ciency
A key nding is that almost 1 in 2 (47%) in the Norwegian population sample performed at the marginal level.As only 2 in 100 (2%) performed below the marginal level, the prevalence of limited HL in the Norwegian population is close to 50%.While almost 4 in 10 (39%) reached the intermediate level, approximately 1 in 10 (12%) performed at the advanced level.Another key nding is that 8% of individuals with diabetes performed below level 1.Among individuals with diabetes, the proportions of individuals performing at the marginal, intermediate and advanced levels were 45%, 34% and 13%, respectively.Discussion Applying Wright's method for the maximum number of strata, we located the benchmark cut-off scores for statistically distinct levels of health literacy (HL) at the HLS-Q12cut-off scores 27, 33 and 39, and we found that Wright's method yielded consistent results on repeated applications in independent samples (population sample and sample of persons with diabetes).Using item mapping based on Rasch calibration of rating scale items, we developed a pro ciency scale that describes what people who reach each benchmark typically know and can do in terms of the HLS-Q12 item content.We found that individuals who reach these cumulative levels of HL typically can access, appraise and apply information relevant to stay healthy (marginal HL), improve health (intermediate HL); and critically judge health claims and compare treatments (advanced HL), respectively.The resulting pro ciency scale expresses progression in HL-what it qualitatively may mean to move from low to high HL.
The newly published Norwegian strategy for HL [4] responds to the growing body of international literature documenting the magnitude of limited HL [14].We found that approximately 50% of the Norwegian population and 53% of individuals with type 2 diabetes may have limited HL.These results add to the ideas of high prevalence of limited HL among European residents [1].A key nding is that 8% of individuals with diabetes scored less than 27 points and performed below the marginal level.It follows that these individuals most probably have di culties with acting on health information relevant to preventing and managing diabetes-related conditions.However, a thought-provoking result is some medical doctors' malpractice of presuming that patients with low HL will get help from their family and friends at home (item q30) if they do not understand or cannot read the written health information provided [58].This is a kind of non-sequitur, as applying advice from family and friends are associated with HL at the intermediate level.However, the chance to bene t from your right to choose a personally preferred method of treatment may have passed when you nally receive help from family and friends to understand the written information provided, as the doctor might already have decided to undertake the most appropriate treatment perceived.Patients within the Norwegian healthcare services have a statutory right to choose a treatment institution and choose among prudent methods of treatment available [58].Nevertheless, to navigate within the healthcare systems and services bene ting from your statutory rights, you need skills to critically judge claims about treatment quality and methods of treatment and fairly compare those-skills associated with navigation literacy and advanced HL.This means that very few people are pro cient at bene ting from their right.
As the short form HLS-Q12 was developed by reducing the 47-item questionnaire HLS-EU-Q47 to a 12item scale based on Rasch modelling and con rmatory factor analysis [6,7], we see few sources of potential bias and threats to internal validity.By applying a recognised statistical method for identifying cut-off scores [12] and con rming these values by analysing responses from two independent samples, we believe the HLS-Q12 cut-off scores are accurate and precise.
Robust measurement properties combined with brevity make HLS-Q12 preferable to HLS-EU-Q47 at present for monitoring HL in individuals and in populations.However, a limitation of our study is that we only based our analyses on Norwegian samples.When assessing HL across countries or groups from different cultural and linguistic backgrounds, groups at the same level of HL may not display the same probability of a rming the items.Then we would deem these items to display differential item functioning (DIF), and cultural and linguistic backgrounds would confound the measurement of health literacy.
The degree to which DIF threatens the external validity and generalizability of our ndings remain unresolved.However, the upcoming M-POHL international study will justify whether HLS-EU-Q47 and its short forms, like HLS-Q12, yield comparative results.The M-POHL study will also identify causal factors associated with limited HL.Another issue that remains unresolved is the extent to which the HLS-Q12 pro ciency levels can predict future 'success' in terms of staying healthy, improving health and making informed healthcare choices through the life course.

Conclusions
Using Rasch-based item calibration and person measurement, we mapped the items and estimated benchmark cut-off scores.The resulting HLS-Q12 pro ciency scale expresses typical knowledge and skills at three statistically distinct pro ciency levels.Based on qualitative interpretation of item content, we labelled the levels marginal, intermediate and advanced, respectively.It is evident from the scale's cumulative nature that it reveals progression in health literacy (HL).
Our description of increasingly more sophisticated levels in HL has important theoretical outcomes as it extends the 'integrated conceptual model of health literacy' developed by Sørensen et al. [18].Recognizing that there is a signi cant gap between 'avoiding injury and staying healthy' (marginal HL) on one hand and 'knowing how to improve mental and physical health' (intermediate HL) on the other may inform policy-makers aiming at addressing health disparities.Our study also demonstrates the supremacy of Rasch modelling when it comes to identifying benchmark cut-off scores and describing criterion-referenced pro ciency levels.
Designed for constructing measurement, the e cient HLS-Q12 scale enables clinicians to gather reliable information about an individual's HL.The goal may be to immediately tailor health information, adapt communication and inform interventions at the personal level.The formative aspects of being able to identify people's HL, and knowing which skills they need to develop to improve HL, have substantial clinical value for health personnel involved in patient education.The HLS-Q12 can potentially facilitate researchers' endeavour in evaluating risk factors and providing recommendations for resource allocation at the local level or providing standards with which population HL can be compared at the global level.For such large-scale studies, the HLS-EU-Q47 'indicators' may specify how we can best locate resources to gain effect on various aspects of HL nationally and globally.Researchers may select the unidimensional short form HLS-Q12 or the multidimensional HLS-EU-Q47, depending on whether they require to construct measurement or not.

Consent for publication
Not applicable

A
promotion (HP) M-POHL: The Action Network on Measuring Population and Organizational Health Literacy PCA: Principal Component Analysis of Rasch residuals PCM: Partial Credit Model RMSE: Root Mean Square Error (the average measurement error variance of person pro ciency estimates) RMSEA: Root Mean Square Error of Approximation (a chi-square based supplementary data-model t index for accommodating large sample sizes) RUMM: Rasch Unidimensional Measurement Model (refers to RUMM2030 software) sd: the sample or "observed" standard deviation of person pro ciency estimates SD: the population or "true" standard deviation of person pro ciency estimates SE: Standard Error DeclarationsEthics approval and consent to participateThis was approved by the Norwegian Social Science Data Service (NSD), ref.38,917.According to NSD this study did not require ethics approval from an ethics committee.Participation was voluntary, and respondents were anonymous and aged 16 years or above.Using telephone interviews, verbal informed consent was obtained from the participants.NSD approved the procedure.