Stage 1: Translation of PEMAT into Japanese
We translated the PEMAT questionnaire and user’s guide with the permission of AHRQ. The translation and cross-cultural adaptation process were carried out according to the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Task Force  and Guidelines for the Process of Cross-Cultural Adaptation of Self-Report Measures . Two translators whose native language is Japanese independently translated the original PEMAT into Japanese (T1 and T2). A health communication researcher (TO) and a physician (EF) reviewed T1 and T2, integrating them into the Japanese forward-translated version (T-12). Two translators whose native language is English independently back-translated T-12 into English (BT1 and BT2). The back-translators were neither aware of the concept of PEMAT nor involved in the forward translation. After making a comparison and integration of BT1 and BT2, the expert committee, including a health communication researcher (TO) and three medical professionals (physician (EF) and nurses (HU and HO)), created a final Japanese version of PEMAT.
Stage 2: Assessment of content validity by expert panel
At a panel meeting of twelve experts (medical professionals and health communication researchers), we determined whether the items apply to the Japanese cultural background, and whether the content is appropriate. The questionnaire’s original developers examined and provided feedback on the Japanese version.
Stage 3: Determining the reliability of the instrument
We tested the inter-rater reliability of the PEMAT using patient education materials written in Japanese. The eligibility criteria for the materials were as follows: (1) developed by academic societies, government offices, or non-profit organizations; (2) including any of the nine topics presented in Health Japan 21 (2nd edition)  such as nutrition and dietary habits, physical activities and exercise, rest and mental health, smoking, alcohol, dental health, diabetes, cardiovascular disease, and cancer; and (3) materials that could be downloaded for free from the Internet. We searched via the two most popular search engines in Japan, Google Japan  and Yahoo! Japan . The search terms in Japanese were ‘topic’ (where ‘topic’ was any of the nine topics in Health Japan 21) AND ‘pamphlet’ OR ‘leaflet’ OR ‘video’ OR ‘patients’ OR ‘explanation.’ We then selected the first 100 written materials and the first 50 audiovisual materials from the search results.
Four evaluators (EF, RS, RI, and RY) rated the patient education materials according to the Japanese version of PEMAT. EF and RS are physicians; RI is a nutritionist, and RY is a health communication researcher. Two rounds of reliability testing were performed because of the low reliability found in the first round. In each round, the evaluators followed the guidance on the question items and evaluation methods using the Japanese version of the PEMAT User’s Guide before evaluating the material. In the first round, two evaluators (EF and RI) independently assessed 50 materials using PEMAT-A/V. EF, RS, and RY each assessed 67 materials with PEMAT-P. Before starting the second round, we revised the explanations and examples in the User’s Guide through discussion with the evaluators. Then the evaluators switched the materials to ensure they did not evaluate the same materials as in the first round. After the evaluation, the mean PEMAT score was calculated, and the materials with the highest and lowest scores were selected.
Stage 4: Assessment of construct validity by testing with the general public
In this stage, we conducted an online survey to determine whether non-experts found the material with high/low PEMAT scores (from the expert evaluation in stage 3) easy/difficult to understand and take action from. The online survey consisted of two studies to test the validity of the PEMAT-P and PEMAT-A/V, one with a leaflet presentation and the other with a video presentation.
Study participants were recruited from registered monitors of an online survey company (Rakuten Insight). Men and women who use Japanese as a native language were eligible to participate in the study. In the PEMAT-P study, participants aged 18 to 69 years were included, and in the PEMAT-A/V part, participants aged 60 to 79 years were included. This is because the age groups targeted by the materials used for intervention in PEMAT-P and A/V were different, as described below. Participants were excluded if they had experience in health care, or were restricted from practicing the action recommended in the materials due to illness or injury. Participants were randomized into two groups using a central computerized random allocation system. One group (intervention group) viewed the material that was highly scored by experts in stage 3, while the other (control group) viewed the low-scoring material.
For testing PEMAT-P, participants viewed leaflets that promote healthy eating habits. The PEMAT-P score of the leaflets was 100% for the high-scoring material group and 69.7% for the low-scoring material group. In testing PEMAT-A/V, we used videos on the topic of locomotive syndrome prevention for the elderly. Locomotive syndrome occurs in conditions with a high risk of motor function decline due to locomotive organ impairment . The overall PEMAT-A/V score of each video was 85.4% (intervention group) and 25.0% (control group), respectively.
The survey company provided participants’ gender and age, and participants responded to questions about their educational background, annual family income, occupation, marital status, and self-perceived health. Participants also answered questions about their baseline health literacy and perceived self-efficacy. Health literacy was measured using the 14-item health literacy scale for Japanese adults (HLS-14) . Self-efficacy was measured by the Self-Efficacy Scale for Positive Eating Behavior  for PEMAT-P and the Home-Exercise Barrier Self-Efficacy Scale  for PEMAT-A/V.
After responding to these questions, participants viewed the relevant materials. They then rated how easy the material was to understand or take action from, on a scale from 1 to 10. They also responded to eight selected items in PEMAT (items 1, 4, 8, 9, 11, 17, 19, and 21) (see Table 1 and Table 2). These items were asked in both the PEMAT-P and PEMAT-A/V studies and were relevant for all the materials presented. At the end of the survey, participants responded about their self-efficacy immediately after the intervention on a scale from 1 to 10. The participants scored the items as 1 if they completely disagreed with the content of the item and 10 if they completely agreed with it.
Additional validation with readability scores
We also used the text readability measurement system, ‘jReadability’ [20, 21], as the developers of the original PEMAT recommended using readability evaluation tools to evaluate the readability of printed materials along with the PEMAT . To assess readability, we manually retrieved the text from the printable materials and transcribed the audio from the audiovisual materials.
In stage 3, the PEMAT scores of the two raters were averaged, and the inter-rater reliability of the PEMAT-A/V tool was calculated. Inter-rater reliability was used to assess the external consistency of the PEMAT, using percentage agreement and Cohen’s kappa for two evaluators. We also calculated Gwet’s AC1  when low kappas occurred despite a high percentage of agreement . Inter-rater agreement was deemed poor (0), slight (0.01–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), or almost perfect (0.81–1.0) .
In stage 4, the primary analysis was on an intent-to-treat basis. Sample size calculation was performed based on an effect size of 0.2 (Cohen’s d) , a significance level of .05, and a power of 0.8. It was estimated that 394 participants per group were required. Differences between the control and intervention groups were evaluated using the two-sample t-test for age and the chi-square test or Fisher’s exact test for sex, educational background, occupation, annual household income, marital status, and self-perceived health. Welch’s t-test was used to compare understandability, actionability, and perceived self-efficacy between the two groups. Pearson’s correlation coefficient was used to determine whether there was a correlation between the PEMAT understandability scores and jReadability scores.
All p-values were two-sided, and p < .05 was considered statistically significant. All analyses were conducted with R version 4.0.3 (2020-10-10).