Validation of the Assessments of Adult Health Literacy: A Rasch Measurement Model Approach

Background: Both the World Health Organization and U.S. Department for Health and Human Services have emphasized the importance of health literacy (HL) to improving population health and reducing health disparities. HL includes three core areas/qualities: functional (i.e., health-related reading, writing, and numeracy), interactive/communicative (i.e., skills for interacting with multiple constituents and sources of information and navigating the health environment), and critical (i.e., personal and community advocacy for health). HL is implicated in medical adherence, preventive health, mental health stigma and help-seeking, and health decision-making. Though HL is critical to health and health decision-making, research on HL is still relatively limited, with most research focusing on functional HL. A major gap in research is related to the lack of measurement of interactive and critical HL. To address this gap, this study modified and assessed the validity of the Assessments of Adolescent Health Literacy (AAHL-Adolescent), test-based assessments of adolescents’ functional, interactive, and critical HL, in an adult sample. Methods: One item from the AAHL-Adolescent item bank was modified to be more appropriate for an adult sample. Adults (n=2346) completed a measurement battery that included the HL item bank (12 functional, 15 interactive, and 9 critical HL questions), Newest Vital Sign (NVS), Single-Item Literacy Scale (SILS), demographics, and questions about HL-related behaviors. The assessments were evaluated and validated using Rasch measurement models. Convergent and criterion validity were assessed. Results: The final 7-item functional, 10-item interactive, and 7-item critical HL assessments and their composite (24 items) fit their respective Rasch models. Item-level invariance was established for gender, ethnicity, education, and age across all assessments. Differential item functioning for race was noted for two items on the interactive HL assessments. Good convergent validity with the NVS and SILS and good criterion validity with the HL-related behaviors were observed for all assessments. Conclusions The AAHL-Adult is the first test-based instrument validated in the U.S. that includes assessments for all three core qualities of HL. These assessments have utility across multiple settings, including public health program planning and evaluation, intervention development and evaluation, and clinical settings.


Current Study
Given the lack of test-based assessments for interactive and critical HL, this study aimed to assess the validity of the Assessments of Adolescent Health Literacy (AAHL-Adolescent) in an adult sample. The AAHL-Adolescent includes test-based assessments of functional, interactive, and critical HL developed and validated in an adolescent sample using Rasch Measurement Models (21). The Rasch measurement model is a probabilistic model that applies sample-independent methodology by tting data to a measurement model rather than to a sample, which is done in classical test theory (22). An important consideration of the Rasch measurement model is that item di culty and person ability are measured independent of the sample and items in the measure, respectively (23). The methodology involves calculating the probability of a person responding in a speci c manner to a speci c item with the assumption that persons with higher ability will have higher probabilities of correctly endorsing the items, and items with lower probabilities of being correctly endorsed are those with higher di culties. For the current study, minor changes were made to the wording of some items in the AAHL-Adolescent to make it more applicable to adults. The Rasch measurement model and other reliability and validity measures were used to assess the utility of the adapted AAHL-Adolescent for adults.

Study Design
This study assessed the psychometric properties of the Assessments of Adult Health Literacy (AAHL-Adult), an adult version of the AAHL-Adolescent (21). This study was approved by the Tufts University Social Behavioral and Educational Institutional Review Board. Participants provided consent prior to completing the assessment battery.

Sample and Data Collection
Data were collected as part of a study of parent-adolescent dyads' HL and health behaviors. Data were collected from 2346 parents between February and October 2020 using the Qualtrics survey panel. Qualtrics randomly selected a sample that was 45% male, 50% with less than a college degree, 40% income >$100,000, and predominantly racial and ethnic minorities (i.e., no more than 35% White). These quotas ensured meaningful comparisons could be made for groups that tend to be underrepresented in research. Qualtrics' recruitment included sending emails to a strati ed (based on aforementioned characteristics) random sample of panelists who were 18 years or older. Panelists who opted to participate in the study rst saw a screen with the consent form and were directed to the survey once they consented to participate. If they did not agree to participate, they were thanked for their time and did not see any further information. The survey included questions about HL and health behaviors. Attention check questions were also included to ensure participants were attentive throughout the survey. The survey took about 20 minutes to complete, and participants chose incentives determined by Qualtrics (e.g., airline miles, gift cards, redeemable points).

Measures Demographics
Participants self-reported their demographic characteristics. Age was reported in years. Participants selected their gender from six options: male, female, transgender (male to female), transgender (female to male), transgender (gender non-conforming), and other. Participants indicated whether they were of Hispanic, Latino/a, or Spanish origin. They also selected their race from the following options: American Indian or Alaskan Native, Asian, Black or African American, Native Hawaiian or Other Paci c Islander, White, multiple races, and prefer not to answer. Participants indicated their highest level of education completed from the following options: less than high school, high school graduate or equivalent, some college, Associate's degree (including occupational or academic degrees), Bachelor's degree, and graduate degree. Questions on gender, race, ethnicity, education, and income also included 'prefer not to answer' as an option.

Assessments of Adult Health Literacy (AAHL-Adult)
The AAHL-Adult is the adult version of the AAHL-Adolescent and includes three test-based HL assessments that align with Nutbeam's (12) de nitions of functional, interactive/communicative, and critical HL. The AAHL-Adolescent was developed using a multi-phase mixed methods design, and the questionnaire development process is summarized here and reported in detail elsewhere (21). The initial item bank for the AAHL-Adolescent was developed via an iterative process involving focus groups and cognitive interviews with adolescents and consultations with experts to ensure content and face validity. After multiple rounds of revisions, the revised item bank included 12 functional HL items, 15 interactive HL items, and 9 critical HL items. The assessments were validated using Rasch Measurement Model, and item-level invariance was established across gender (boys vs. girls), ethnicity, and age groups (12-15-year-olds vs 16-18-year-olds). The nal functional (6 items), interactive (10 items), and critical (7 items) HL assessments had good convergent validity with the Newest Vital Sign (NVS) and were positively related to applied HL behaviors (e.g., reading instructions before taking medicines). For the AAHL-Adult, the revised AAHL-Adolescent item bank was retained, and minor wording changes were made to 1 item (i.e., in a scenario related to school, the adult version asked about actions to help a child while the adolescent version asked about actions to help themselves) to make it more applicable to adults.

Newest Vital Sign (NVS)
Participants completed the NVS, a 6-item measure of their functional HL (24). For the NVS, participants were presented with an ice cream nutrition label and asked to answer 5 to 6 questions that assessed their reading and numeracy skills. Responses were scored, summed, and categorized into NVS-High Likelihood of Limited Literacy (0-1 correct), NVS-Possibility of Limited Literacy (2-3 correct), and NVS-Adequate Literacy (≥ 4 correct). The NVS has good internal consistency (Cronbach α = 0.76) (24) and summed scores were used to evaluate convergent validity with the AAHL-Adult.

Single-Item Literacy Scale (SILS)
The Single-Item Literacy Scale (SILS) is a 1-item perceptions-based measure of functional HL. Participants were asked, "How often do you need to have someone help you read instructions, pamphlets, or other written material from your doctor or pharmacy?" (25). Participants' possible responses included never, rarely, sometimes, often, and always. Participants who endorsed never or rarely were categorized as SILS-Adequate HL, while all other responses were categorized as having di culties with reading and writing health-related information (SILS-Inadequate HL). In their validation sample, Morris and colleagues (25) found that the measure had 54% sensitivity and 83% speci city in distinguishing people with limited reading ability.

HL Behaviors
Similar to the AAHL-Adolescent (21), participants were asked to endorse two behaviors including questions the truthfulness of health information found online and reads instructions before taking medicine, consistent with Sorenson and colleagues' (2) description of the applied use aspect of HL. These items were used to establish criterion validity for the AAHL-Adult.
Statistical Analyses Winsteps (26) software was used to estimate Rasch measurement models. All other analyses were conducted in SPSS 28 (27).

Rasch Analyses
Model Type. The functional and interactive HL items had distinct correct and incorrect answers, therefore, Rasch dichotomous models were used to assess these scales. Several critical HL items included answers that spanned different levels of critical HL, so assessing the critical HL scale using the Rasch Partial Credit Model was most appropriate. Given that the composite model included all of the nal items across the scales, including dichotomous and polytomous items, the Rasch Partial Credit Model was used to estimate this model. Model Fit. Person and item parameters were estimated using joint maximum likelihood estimation. Fit statistics (i.e., in t and out t mean-squares and their standardized equivalents) were scrutinized for item and person parameters. Speci cally, only if mean-squares were greater than 1.5, then their standardized statistics were reviewed, and if these standardized statistics were > 2 or <-2, then a signi cant mis t was indicated and the item was considered for removal (28). Given that out t statistics are more sensitive than in t statistics to misalignment of person responses and item di culties, more emphasis was put on out t statistics. For each assessment, items were removed iteratively, starting with the most mis tting item (e.g., highest mean-square out t mis t and standardized out t statistic > 2). The models were re-estimated and t statistics were reviewed after each item removal. As proposed by Linacre (28) and done in other studies (29,30), person mis t considerations included removing one round of the most mis tting responses, re-estimating the model, and comparing the t statistics of the modi ed and original responses. If the modi ed responses improved model t, then the modi ed dataset was retained and nal analyses were conducted on this dataset. However, if the removal of responses did not improve model t, the original dataset with all responses was retained. Items with negative and very low point-measure correlations are indicative of the items not belonging to the scale (22) and were removed. Key assumptions of Rasch were assessed at each iteration of model estimation. Unidimensionality (do items assess a shared latent construct?) was established if the eigenvalue of the unexplained variance in the rst contrast in the principal components analysis of the residuals was < 2 (22,31). Local independence (are item responses statistically independent of each other?) was con rmed by comparing the Q 3,* test statistic (i.e., maximum standardized residual correlation between a pair of items [Q 3,max ] -mean of all standardized residual correlations between item pairs [mean of Q 3, ]) to the critical values reported by Christensen and colleagues (32). Critical values for Q 3,* test statistic at the 95th and 99th percentile are 0.26 and 0.31, respectively. To assess the monotonicity of the latent trait (are scores monotonically non-decreasing across the latent trait?) assumption, test characteristic curves were scrutinized to ensure they were monotonically ascending (33). As with other studies using Rasch, statistical ndings and theoretical reasonings informed nal decisions to retain or delete items (34). See Table 1 for item di culties and t statistics for the nal assessments.
Reliability. Item-and person-separation reliability were examined. For item-separation reliability, the goal is to have a good item di culty range indicated by item-separation reliability statistics closer to 1. Regarding person-reliability, both Rasch person-reliability and classical test theory reliability statistics assume symmetry in ability. Wright (35) argued that symmetry in ability is rare in health-related research and proposed the Wright sample-independent person (test) reliability statistic to account for the lack of symmetry in samples. The Wright sample-independent person (test) reliability statistic is calculated after measure calibration (35) and used in place of Winsteps' person reliability statistics. The 2-step procedure involves determining the number of strata across the scores and then using this to calculate the sample-independent reliability. Given the focus on a health topic, sample-independent reliability was appropriate for this study.

Other Analyses
Descriptive statistics were calculated for the nal AAHL-Adult. Pearson correlations between the summed scores of the NVS and the AAHL-Adult were computed to establish convergent validity (40). Point-biserial correlations between the dichotomized SILS and the assessments of HL were also computed to assess convergent validity. The NVS and SILS only measure functional HL; therefore, only moderate correlations with the interactive and critical HL assessments were expected. To assess criterion validity, logistic regressions predicting HL-related behaviors from AAHL-Adult after accounting for demographics (i.e., age, gender, race, ethnicity, education) were modeled. Receiver Operating Characteristic (ROC) curves (41) were estimated to determine the sensitivity (proportion of positive results that are true positives) and speci city (proportion of negative results that are true negatives) of AAHL-Adult assessments to predict NVS and SILS categories as well as the HL-related behaviors. Mean scores and categorizations across assessments were compared using independent t-tests and one-way analysis of variance. Where applicable, Bonferroni corrections were calculated. Effect sizes (the mean difference between z-scores across categories of the other assessments) were also computed. Lastly, crosstabs, chi-squares, and Cramer V tests were calculated to illustrate how categorizations across the assessments were related.

Results
The sample included 2346 adults who were parents or caregivers of adolescents (Mean age = 43.02 years, SD = 8.41; ~63% women). Five participants identi ed as transgender, and one selected 'prefer not to answer.' Given the discrepancy in subsample sizes, gender-speci c analyses were only conducted on persons who identi ed as binary (i.e., male, female). The largest racial group was White (~ 63%) and ~ 13% of the sample identi ed as of Hispanic or Latin origin. Approximately 53% of the sample had less than a Bachelor's degree, and most participants had Adequate Literacy as assessed by the NVS (~ 55%) and SILS (~ 83%). See Table 2 for descriptive statistics.

AAHL-Adult Functional Health Literacy (AAHL-Adult-FHL)
The AAHL-Adult-FHL item bank included 12 items. After the Rasch analyses, 5 items were removed, leaving a 7-item nal AAHL-Adult-FHL assessment (see Appendix A). The nal items assessed adults' numeracy and reading comprehension skills using both a cafeteria menu and an over-the-counterprescription drug label. Two items were removed due to out t mis t. Three items were removed due to DIF on race. Removal of one round of the most mis tting person responses did not improve model t, as such, the nal model estimation used the original dataset. Point-measure correlations for the nal 7 items were 0.43-0.61. Assumptions of unidimensionality (eigenvalue = 1.34), local independence (Q 3,* test statistic=-0.14), and monotonicity were met. No DIF was detected for age group, gender, race, education, and ethnicity. Item separation reliability (1.00) was acceptable. The Wright sampleindependent person (test) reliability statistic was 0.71, and the scores differentiated 1.6 distinct levels of performance: Inadequate (scores 0-4) and Adequate (scores 5-7). Participants with an Associate's/some college had higher AAHL-Adult-FHL scores than those with ≤ high school diploma (MD = 0.40, p < 0.001).
Participants who had NVS-Adequate Literacy had higher AAHL-Adult-FHL scores than those who had NVS-High Likelihood of Limited Literacy (MD = 2.20, Sensitivity and speci city at the cutoff scores are available in Table 3.

AAHL-Adult Interactive Health Literacy (AAHL-Adult-IHL)
The AAHL-Adult-IHL item bank included 15 items. After the Rasch analyses, 5 items were removed, leaving a 10-item nal AAHL-Adult-IHL assessment (see Appendix B). Similar to the adolescent interactive HL assessment, the AAHL-Adult-IHL tested adults' skills for interacting with providers, multiple sources of contradictory information, and using knowledge to inform current behavior. Three items were removed due to low point-measure correlations.
One item was removed due to high DIF on race and education, while another item was removed due to high DIF on race and age. Removal of one round of the most mis tting person responses improved model t, as such, the nal model estimation was completed using the adjusted dataset. Point-measure correlations for the nal 10 items were 0.35-0.53. Assumptions of  Convergent validity was established with the NVS (r = 0.52, p < 0.001) and SILS (r = 0.42, p < 0.001). The sensitivity of the AAHL-Adult-IHL assessment to detect individuals with NVS-Possibility of Limited Literacy (vs. NVS-High Likelihood of Limited Literacy) at the 4.5 cutoff score was 82.3%, and the speci city was 49.4% with an area under the ROC curve of 0.70 (CI:0.67,0.74). When a cutoff score of 8.5 was applied, the sensitivity and speci city was 6.9% and 97%, respectively. The sensitivity of the AAHL-Adult-IHL to detect individuals with NVS-Adequate Literacy (compared to NVS-High Likelihood of Limited Literacy and NVS-Possibility of Limited Literacy) at the 4.5 cutoff was 96.6% and the speci city was 31.7%. At the 8.5 cutoff score, the sensitivity and speci city were 15.4% and 99.5%, respectively. The area under the ROC curve was 0.73 (CI: 0.71,0.75). For the SILS, the sensitivity of the AAHL-Adult-IHL assessment to detect individuals with SILS-Adequate HL at a 4.5 cutoff score was 90.2% and speci city of 47.5% with an area under the ROC curve of 0.77 (CI:0.74,0.79). At the 8.5 cutoff score, sensitivity and speci city were 12% and 97.6%, respectively.
Regarding criterion validity, AAHL-Adult-IHL was positively related to questioning the truthfulness of health information found online after adjusting for demographic covariates (OR = 1.16, CI:1.09,1.23, p < 0.001). The sensitivity and speci city of the AAHL-Adult-IHL to detect adults who questioned the truthfulness of health information found online at the 4.5 cutoff score were 85% and 26.6%, respectively, with an area under the ROC curve of 0.61 (CI:0.57,0.64). At the 8.5 cutoff score, the AAHL-Adult-IHL had a sensitivity and speci city of 11.3% and 96.1%, respectively. AAHL-Adult-IHL was positively related to reading instructions before taking medicine (OR = 1.59, CI:1.40,1.81, p < 0.001). The sensitivity and speci city of the AAHL-Adult-IHL to detect individuals who read instructions before taking medicine at the 4.5 cutoff score were 84.8% and 65.4%, respectively, with an area under the ROC curve of 0.82 (CI:0.77,0.88). At the cutoff score of 8.5, the sensitivity was 10.6% and the speci city was 100%.

AAHL-Adult Critical Health Literacy (AAHL-Adult CHL)
The AAHL-Adult-CHL item bank included 9 items that assessed skills for health advocacy and engaging in decision-making where there are socioeconomic barriers. After evaluating the items using the Rasch Partial Credit Model, 7 items were retained for the nal assessment (see Appendix C).
Response options were ranked from not at all critical HL to collective advocacy skills, except for items CRHLD2 and CRHLD6 which were scored as incorrect or correct. Two items were removed due to DIF on education and race. Model t did not improve when the most mis tting person responses were removed. Therefore, nal Rasch analyses were conducted on the original dataset. Point-measure correlations for the nal assessment were 0.42-0.63.
Assumptions of unidimensionality (eigenvalue = 1.35), local independence (Q 3,* test statistic= -0.14), and monotonicity were met. No DIF was detected for gender, education, age group, race, and ethnicity in the nal model. Item separation reliability (1.00) was acceptable. The Wright sample-independent person (test) reliability statistic was 0.88, with the scores differentiating 2.7 two distinct levels of performance: Inadequate (scores 0-5), Adequate (scores 6-11), and Pro cient (scores [12][13][14][15]. Note that the scores ranged from 0-15 though only 7 items were retained. This is because with Rasch Partial Credit Models, each polytomous response option has a unique score that corresponds to the degree of correctness. At the 11.5 cutoff score, the sensitivity and speci city were 69.2% and 80.8%, respectively.

AAHL-Adult Composite HL (AAHL-Adult-Composite)
Using a Rasch Partial Credit Model, all of the items from the nal AAHL-Adult-FHL, AAHL-Adult-IHL, and AAHL-Adult-CHL were tted in a single model.  were 100% and 0.8%, respectively. At the 15.5 cutoff score, the sensitivity and speci city were 98% and 30.1%, respectively. At the 23.5 cutoff score, the sensitivity and speci city were 72.3% and 79%, respectively. At the 29.5 cutoff score, the sensitivity and speci city were 8.5% and 99.9%, respectively. The area under the ROC curve was 0.83 (CI: 0.81,0.86).
Regarding criterion validity, AAHL-Adult-Composite was positively related to questioning the truthfulness of health information found online after adjusting for demographic covariates (OR = 1.06, CI: 1.04,1.10, p < 0.001). The sensitivity and speci city of the AAHL-Adult-Composite to detect individuals who questioned the truthfulness of health information found online at the 7.5 cutoff score were 99.8% and 0%, respectively. At the 15.5 cutoff, the sensitivity and speci city were 93.1% and 6.7%, respectively. At the 23.5 cutoff, the sensitivity and speci city were 65.6% and 51.6%, respectively. At the 29.5 cutoff score, the sensitivity and speci city were 8% and 99.9%, respectively. The area under the ROC curve was 0.62 (CI: 0.58,0.65). AAHL-Adult-Composite was positively related to reading instructions before taking medicine (OR = 1.23, CI:1.17,1.30, p < 0.001). The sensitivity and speci city of the AAHL-Adult-Composite to detect adults who read instructions before taking medicine at the 7.5 cutoff score were 99.9% and 10%, respectively. At the 15.5 cutoff score, the sensitivity and speci city were 93.9% and 38.5%, respectively. At the 23.5 cutoff score, the sensitivity and speci city were 64.7% and 88.5%, respectively. At the 29.5 cutoff score, the sensitivity and speci city were 7% and 100%, respectively. The area under the ROC curve was 0.86 (CI:   Table 4 shows the crosstabs based on AAHL-Adult HL categorization and all chi-squares and Cramer's V tests were signi cant at the p < 0.001 level, suggesting that there is a relationship between the assessments when categories are used.

Discussion
This study aimed to validate the AAHL, test-based assessments of adolescents' functional, interactive, and critical HL, in an adult sample. Construct validity was established using Rasch Measurement models. All nal assessments t their Rasch models and met Rasch assumptions of monotonicity, unidimensionality, and local independence. When all assessments were modeled as a single scale, the items t the Rasch Partial Credit Model and met Rasch assumptions. The assessments had good convergent validity with test-based and perceptions-based measures of functional HL and good criterion validity with HL-related behaviors.
The AAHL-Adult-FHL distinguished two categories, while both the AAHL-Adult-IHL and AAHL-Adult-CHL assessments distinguished three categories, and the AAHL-Adult-Composite   distinguished ve categories. The cutoff scores were determined using the Wright independent-sample reliability (test) statistic formula. The sensitivity and speci city of these cutoff scores to distinguish categories on the NVS and SILS, as well as individuals who engaged in HL behaviors were also considered. At the Adequate AAHL-Adult-IHL and AAHL-Adult-CHL cutoff scores, the assessments had higher sensitivity or better ability to identify respondents with NVS-Possibility of Limited Literacy, NVS-Adequate Literacy, SILS-Adequate HL, and who question the truthfulness of the information found online as well as read instructions before taking medicines. However, this was compromised by a lower ability to correctly identify respondents without the above characteristics. At Pro cient AAHL-Adult-IHL and AAHL-Adult-CHL cutoff scores, the assessments had higher speci city and thus better ability to identify individuals who did not have NVS-Possibility of Limited Literacy, NVS-Adequate Literacy, SILS-Adequate HL, and who did not question the truthfulness of the information found online as well as read instructions before taking medicines. Regarding AAHL-Adult-FHL, the Adequate cutoff score showed good sensitivity across the NVS, SILS, and HL-related behaviors. However, acceptable speci city was only observed for NVS-Possibility of Limited Literacy and reading instructions before taking medicine. This assessment may be too short to be able to identify true negatives. Speci city improved at a cutoff score of 5.5; however, given the length of the assessment, the original cutoff score of 4.5 should be used, and the test be considered a rule-out rather than a rule-in test. Overall, the sensitivity and speci city analyses support the cutoff scores.
Nutbeam (12) argued that functional, interactive, and critical HL are distinct but related types of HL that require an increasingly complex set of skills, with functional HL being the simplest, followed by interactive HL, and then critical HL. This conceptualization was partially supported in this validation study.
Speci cally, the results support the relatedness of the three types of HL. The highest category on each assessment had the highest scores on the other assessments (e.g., Adequate AAHL-Adult-FHL category had the highest AAHL-Adult-IHL, AAHL-Adult-CHL, and AAHL-Adult-Composite scores). Nonparametric tests of the categorizations also con rmed linear relationships across the assessments. Contrary to Nutbeam's (12) ordering of complexity of the HL types, the AAHL-Adult-IHL assessment appeared to be more di cult than the AAHL-Adult-CHL assessment. A look at population and the multiple social factors intertwined with race, racial differences in all three core qualities of HL should be explored further. Contrary to much of the research (1, 5, 50), individuals with a graduate degree had lower HL scores than those with less education. It is important to explore intersectionality (e.g., education and race) and consider other relevant variables (e.g., chronic disease status) when drawing conclusions about the HL/education relationship, especially given that interactive and critical HL may be more in uenced by experiences navigating health decision-making rather than literacy.
To date, test-based interactive and critical HL measures have been limited (only 1 measure in Boston University Health Literacy Toolshed). In addition to providing test-based measures of adults' interactive and critical HL, this study provides a single validated assessment that includes all three core qualities of HL described by Sorensen and colleagues (2) and Nutbeam (12), thus allowing for assessing and categorizing individuals' core HL. Additionally, the existence of an adolescent version of the measure facilitates dyad studies assessing caregivers' and adolescents' HL.
A limitation of the study is that the use of a convenience, non-representative sample may limit the generalizability of the utility of the assessments.
However, this is a minor concern given that Rasch models are sample-independent and less subject to sample bias than classical test theory. Another limitation is that the items may not be appropriate for all cultures. For example, the AAHL-Adult-FHL assessment requires reading a cafeteria menu and over-the-counter prescription label, individuals in cultures where these items are not common may have more di culty answering related questions and be misclassi ed as having inadequate functional HL. Cultural appropriateness is an important consideration for using and/or adapting this measure.
Relatedly, the sample size was appropriate for conducting Rasch analyses, however, it was insu cient for exploring DIF for age in years (vs age groups) and some racial groups. Future research should assess DIF for age in years, Alaskan Native, Native Americans, Native Hawaiian and Other Paci c Islanders, as well as other important demographic and social variables (e.g., federal poverty level, chronic illness status). The AAHL-Adult should also be included in longitudinal study designs to assess its predictive validity and time invariance.
The AAHL-Adult has both research and clinical utility. Many interventions targeting health behavior change employ strategies that require or intervene on functional, interactive, and critical HL but do not adequately assess HL in their evaluations. Including test-based assessments of HL in intervention batteries allow for evaluating HL as a moderator, mediator, and/or outcome and may help inform future intervention modi cations to improve outcomes.
Importantly, the AAHL-Adult is not subject to social desirability bias or people misestimating their abilities and competencies as is likely in perceptionsbased instruments. Researchers and public health professionals may use the assessments to assess their intended populations to identify HL intervention needs and determine the suitability of their non-HL interventions. Regarding clinical utility, test-based assessments of HL may be used to identify adults who require assistance navigating their health care or inform how providers interact with patients.

Conclusion
The nalized AAHL-Adult-FHL, AAHL-Adult-IHL, AAHL-Adult-CHL, and AAHL-Adult-Composite assessments met Rasch assumptions, had good model t, and convergent and criterion validity. The AAHL-Adult-IHL and AAHL-Adult-CHL are the rst test-based measures of interactive and critical HL validated in the U.S. The AAHL-Adult-Composite is also the rst test-based assessment validated in the U.S. that measures all three core qualities of HL. These assessments have utility across multiple settings, including public health program planning and evaluation, intervention development and evaluation, and clinical settings. These assessments will also likely contribute signi cantly to how HL is studied in future studies. This study was approved by the Tufts University Social Behavioral and Educational Institutional Review Board. All methods were performed in accordance to the approved protocol and relevant guidelines and regulations for human participants research. Informed consent was obtained for adults' participation in data collection.

Consent for publication
Not applicable.

Availability of data and materials
The dataset analysed during the current study are not yet publicly available given that it is a relatively small dataset and the corresponding author is still publishing primary papers using the dataset. The dataset can be made available from the corresponding author on reasonable request.

Competing interests
The author declare that they have no competing interests.

Funding
This work was supported by the National Institute of Health [grant numbers 1K12HD092535, L30DK126209]. The funding agencies had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author contributions SAF designed the study, collected the data, and analyzed and interpreted the data. SAF drafted and revised the manuscript. SAF approved the submitted version and agree to be personally accountable for their contributions. SAF agrees to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature.