Validation and adjustment of the Patient Experience Questionnaire (PEQ) based on a Norwegian hospital study.

Background- This paper assesses the psychometric qualities of the Patient Experience Questionnaire (PEQ), thereby validating a patient-oriented service climate measurement model in a hospital environment, and modies the model based on empirical results. This study employed survey data gathered by the Norwegian Institute of Public Health from adult inpatients at somatic hospitals in the Health South-East RHF in Norway. The survey engaged 4,603 patients out of 8,381 from 5 main hospitals in the region. Data was analysed with the aid of AMOS, SPSS, and Microsoft Excel. Results- The study found that an 8-factor model of the PEQ generally showed good tness to the data, but assessment of discriminant validity showed that this was not the optimal factor solution among 4 of the 8 dimensions. After comparisons of models, support was nally found for a model with a second-order factor for 4 of the factors: “nurse services”, “doctor services”, “information”, and “organisation”, collectively named “treatment services”. The proposed model demonstrated good validity and reliability results. Conclusions- The results present theoretical and practical implications. The study recommends that inferential analyses on the PEQ should be done with the second-order factor. Also, a revision of the PEQ is recommended subject to more conrmatory studies with larger samples in different regions. The study indicates a second-order factor structure for assessing and understanding patient experiences, a nding having both theoretical and managerial implications.


Introduction And Background
Healthcare professionals are facing heavy pressure to meet the growing needs and expectations of patients (1). This owes to the increasing and alarming rate of morbidity and multi-morbidity in Western countries (2), together with aging populations and the healthcare needs of the aged. This pressure on healthcare professionals has increased in recent times with the outbreak of global pandemics such as COVID- 19. Notwithstanding these morbidity rates and growing needs of patients, healthcare providers and professionals are expected to ensure positive patient experiences. This study, focusing on hospitals and their professionals, seeks to examine the patients' experiences with hospital service climate, focusing on the psychometric quality of a patient-reported experience measure (PREM).
The endeavour of gathering patients' experiences with healthcare has gained popularity, thus resulting in the development of PREMs that have been used in surveys in various countries. In a bid to clarify the meaning of patient experiences, Wolf and Jason (3) synthesised various de nitions of the concept and maintained that patient experiences comprise individual as well as collective events and occurrences that happen in the process of caregiving, and this has strong links with the patients' expectations and how they were met. Wagland, Recio-Saucedo (4) noted that signi cant progress has been made in understanding patient experience. The concept is viewed as interactions of patients with aspects of the healthcare delivery such as nurse services, doctor services, organisation of the caregiving process in hospitals, and information delivery, where these aspects (dimensions) culminate in the entire continuum of experience that patients have with healthcare, as reported by the patients.
From patients' perspective, interactions with dimensions of healthcare have been theoretically underpinned by the Donabedian (5) framework for assessing healthcare quality (6) and considered the most widely used in the healthcare sector to assess quality (7). According to this framework, quality of healthcare can be assessed by making inferences under 3 categories: structure, process, and outcome.
The structure deals with the setting in which care is given, for instance, facilities, equipment, and human resource. The process deals with what is done in giving and receiving care, for instance, nurse and doctor services as well as good communication and information sharing between patients and hospitals; lastly, the outcome deals with the effects of care on health and well-being (5,7).
Increased understanding of patient experiences of hospital climate has similarly been aided by increased research and several studies on measuring the construct. Measurements in social science provide adequate guidelines for assessing phenomena and people's attributes that are not directly and easily observable (8). Employing poor and inadequate measures can be very costly to research, in terms of drawing invalid conclusions, making policy decisions based on false information, and wasting respondents' time and efforts (9). DeVellis (9), however, indicated that a major challenge to developing adequate measures in social science is the immaterial nature of social science constructs supported by constantly changing theories. This makes measurements in social science susceptible to constant changes in performance and adequacy in assessing the constructs. Consequently, social science measures need to be constantly reviewed and reassessed to keep them abreast with changing theories and constructs and to uphold their validity and reliability. Therefore, reassessing PREMs to ensure adequate psychometric qualities is essential for theoretical and practical advancement of knowledge of patients' experiences, hence the focus and aim of this study.
Justi cation of the study The goal to accurately measure patient experiences has resulted in several PREMs for general and specialised healthcare. The questions and dimensions that these PREMS have produced are indicative of patients' shared experiences. Most of these measures identi ed similar dimensions of experiences, such as those relating to nurse services, doctor services, information and communication, hospital organisation and standards, and discharge from the hospital (10)(11)(12)(13)(14). Although some of these studies differed with regard to the naming of the dimensions, the content of the items remained very similar among the PREMs. This study is underpinned by 3 main justi cations: (i) psychometric statistical analyses have evolved over the years with more robust tools in validating scales; (ii) due to the plethora of patient experience measures and unascertained psychometric qualities, existing PREMs should be reexamined to ascertain their validity and reliability, rather than developing new ones; (iii) there is the need for scrutiny of existing PREMs because they may be prone to theoretical and practical changes. These justi cations are elaborated below.
The Norwegian Institute of Public Health (NIPH) conducted a survey in the East health region among a few hospitals, adapting an earlier validated PREM, the Patient Experience Questionnaire (PEQ) (15). In the development and validation study, Pettersen, Veenstra (15) employed literature reviews, focus groups, pilot studies, and 2 cross-sectional surveys (1996 and 1998) across 14 hospitals in Norway. The study used exploratory factor analysis, reliability test (Cronbach's alpha), and construct validity test. The study found 10 factors and 20 nal items out of an initial 35 items: "information on future complaints", "nursing services", "communication", "information examinations", "contact with next-of-kin", "doctor services", "hospital and equipment", "information medication", "organisation", and "general satisfaction". All the factors recorded Cronbach's alpha scores between 0.61 and 0.83. Construct validity was also ascertained in the study by examining the relationship between the instrument and demographic factors such as age and gender. Stressing the lack of valid and reliable instruments, Pettersen, Veenstra (15) concluded that it is imperative to re-examine existing patient experience measures so as to improve methodology. They further recommended employment of the PEQ for future in-patient experience surveys, hence the choice for the current study. Although this measure was adapted and modi ed for use by the NIPH, the performance of the measure should be called into question because this measure was developed and validated more than a decade ago. Psychometric analyses are evolving with more robust validating tools and methods, and this is evident in the study by Pettersen,Veenstra (15) where issues such as discriminant validity and measurement invariance as well as other psychometric issues were absent in the analyses-a gap that the current study tackles.
Beattie, Murphy (16) also noted the problem of multiple patient experience measures with unascertained psychometric quality. This problem has hindered the use of data from patient experience surveys to adequately improve and sustain quality of care in hospitals. In the systematic review, Beattie, Murphy (16) developed a matrix to help choose PREMs for research and to identify research gaps in existing ones. This matrix showed that the PEQ study by Pettersen,Veenstra (15) lacked analyses such as criterion-related validity. On this basis, the current study asserts that rather than developing more PREMs (which seem already saturated), existing ones should be re-examined, as recommended earlier by Pettersen,Veenstra (15), in light of current analyses and conceptual underpinnings. This need for reexamination has also been recommended by other systematic reviews on patient experience (17,18).
Additionally, some PREMs have been developed in Norway to capture the phenomenon of patient experiences with general health practice as well as experiences with speci c health issues and elds, with most of them asking questions on general patient satisfaction (e.g. 12,15,19,20). DeVellis (9) noted that the constantly changing theories of social science constructs challenge the adequacy of their measures. Haugum,Danielsen (17) similarly recommended the need to repeat patient experience surveys and their outcomes in order to generate more validated instruments, as they are potentially affected by constantly changing contextual factors. By inference, it can be said that the underlying psychometric rigors of a PREM can dwindle as they are employed over a long period. Although several surveys exist on patient experiences on various issues (e.g. 21,[22][23][24][25], a re-analysis of the psychometric performance of any particular measure is lacking. The quest to improve healthcare delivery and hospital service climate based on patients' experiences should begin with ascertaining the psychometric quality of PREMs. Based on these justi cations, the purpose of this article is to test the psychometric qualities of the PEQ, thereby validating a patient-oriented service climate measurement model in a hospital environment.

Conceptualisation of patient-oriented hospital service climate
The concept of 'climate' has been used extensively in the organisational setting for decades as an indicator of conditions and shared perceptions from organisational members, mainly employees. The concept is de ned as comprising measurable aspects of the work environment that are perceived and shared as the formal and informal practices, policies, and procedures by individuals in an organisation (26,27). Thus, these perceptions are socially constructed and are shared among the individuals (28); furthermore, they have an in uence on individuals' motivations and actions (26). A number of subconcepts of organisational climate have been developed, including safety climate (29) and service climate (30). Service climate is explained as the shared perceptions of employees regarding service quality based on policies, practices, and procedures (31), re ected in feedback from customers on service and satisfaction, among other elements (30). These concepts are applicable and measurable in different industries and sectors. In the health sector, organisational climate and sub-concepts have been explored and examined mainly from the workers' perspective to indicate employee conditions, quality of patient care, and patient safety (e.g. 32).
This study links organisational climate with Donabedian's framework to conceptualise hospital service climate, not from the employees' perspective but from the patients' perspective. Thus, in this study, hospital service climate is de ned as patients' shared experiences and perceptions of hospital work environments, the services, and the formal and informal practices and procedures that make up the entire caregiving process, which inform patient outcomes such as satisfaction, health bene ts, and health level.
In sectors such as health, the perceptions of patients are imperative for managerial and organisational development. Patients' reports on their experiences with the hospital climates, such as the structures and processes of giving care, can therefore be useful indicators and predictors of their outcomes, evidenced in previous studies (e.g. 33,34,35). This affords hospital management adequate information on their service climate in order to improve the quality of healthcare. The study pursues its aim in light of this theoretical linkage, based on the idea that adequate PREMs generate adequate information on patients' experiences with hospital climates and resulting outcomes.

Sample and data collection
This study employed anonymous secondary survey data from the Norwegian Institute of Public Health gathered from adult inpatients at somatic hospitals in the Health South-East RHF in Norway. The survey engaged patients from 5 main hospitals in the region and who were admitted for at least a day. As part of eligibility criteria, patients who were admitted between October and November in 2015 were included. The survey was started by the Norwegian Knowledge Centre for Health Services in the fall of 2015 and was continued at the Norwegian Institute of Public Health in the rst quarter of 2016. Patients who visited the 5 hospitals were identi ed through their contact information after they were discharged. Questionnaires were sent to their respective addresses via post mail with a return envelope. About 8,381 patients were eligible and contacted. The total number of respondents who completed and returned their questionnaires was 4,603, yielding a response rate of 54.92%. Patients were asked to consider various aspects of their experience being admitted. The questionnaire aimed at using feedback to identify which areas are working well and which areas the hospital should work to improve.
Background information, such as questions on whether or not the patient chose the hospital he/she was admitted to, was also included in the questionnaire.

Data analysis Preliminary analyses
The study analysed the data with the aid of Microsoft Excel, SPSS v.24, and AMOS v. 25. Preliminary analysis (such as checking for normality, outliers, and missing value analysis) was conducted in SPSS. In order to ensure maximum privacy of respondents and still maintain relevant variables for analysis, departments for the analysis were aggregated into medical departments (Med) and surgical departments (Kir) across the hospitals based on the more speci c and varied information on units in the hospitals provided by participants. This aggregation was performed according to the departmental codes for health institutions provided by the Norwegian Health Authority.

Measurement model development
The initial measurement model (see Model 1 in Table 3) was developed in AMOS without modi cation indices (due to the exclusion of missing values). Missing values were replaced after the estimation of the initial model to obtain modi cation indices for correlating error terms among the items and improving the tness of the model (see Model 2 in Table 3). Due to the non-randomness of the missing values, being mindful of how they were replaced was necessary. The study chose to use multiple imputations to replace them as recommended for non-randomness (36,37). However, the 5 different imputations generated could not be pooled in AMOS as a single imputation for the estimation of the model. Thus, the missing values were eventually replaced with the series mean method. It is noteworthy that the missing values were only replaced in order to generate a full estimation with modi cation indices for correlating the error terms. Although all subsequent models after the initial model were estimated with the correlated error terms, estimations were done on the data with missing values, with the aim of obtaining a more accurate t of the data to the models.
The initial model with modi cations (correlated error terms), Model 2, was compared to 6 other models (see Models 3-8 in Table 3), obtained by combining some dimensions into a single factor to further justify the tness of the modi ed initial model. These combinations were based on the correlation coe cients between the dimensions (see Table 3 for the combinations of the dimensions). In addition, a proposed model containing a second-order factor for "nurse services", "doctor services", "information", and "organisation" was also developed and compared to the initial modi ed model based on the validity tests, correlation analyses, and theoretical justi cations (wording of questions). Fitness of all the models was ascertained using the following indices: Comparative Fit Index (CFI), Tucker Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA), and the PCLOSE. The thresholds recommended by Hu and Bentler (34) are presented in Table 1.

Validity and reliability
Validity in this study was ascertained using convergent, discriminant, and predictive validity tests. Convergent validity deals with the relationship between a latent construct (patient experience dimensions) and its items (38). The Average Variance Extracted (AVE) was used to check convergent validity, where values must be at least 0.50, indicating that at least half of the variance in the construct (dimension) is explained by its items. Discriminant validity focuses on a construct and its items in relation to other constructs-that is, how different one construct (or dimension) and its items are from other constructs in the model (38). Discriminant validity was examined using the Fornell-Larcker procedure, where discriminant validity is supported when the square root of the AVEs is greater than the correlation coe cients between the constructs (39). Predictive validity focuses on the ability of the measure and dimensions to relate to and predict previously ascertained outcomes in literature. This was determined through correlation and regression analyses between patient experiences (and dimensions) and outcome variables (patient satisfaction, health bene ts, and health level) with the aid of SPSS. Reliability of the measurement model was also determined using composite reliability values for every dimension of the patient experience measure, with a recommended value of at least 0.70.

Ethical considerations
This study, with regard to data collection, analysis, and compilation, was conducted within the ethical and legal provisions and guidelines of the Norwegian Institute for Public Health (NIPH) and the University of Stavanger. The Norwegian Data Protection Authority and the Norwegian Directorate of Health approved the procedures in the survey. The hospital data protection o cial assessed the data processing in the hospitals where survey extension took place. Informed consent was obtained from participants in the survey. Respondents were informed that participation was voluntary and they were assured of con dentiality of the information they will provide. Respondents were also informed that they could opt Page 8/24 out of the survey at any point as well as the procedure for opting out if they wished. Data was stored in a safe repository with a password, only accessed by the researchers. This study did not present results that revealed patients' identities, thus maintaining anonymity of respondents and con dentiality of responses. All relevant ethical requirements were duly upheld.

Preliminary analysis and sample characteristics
The study made use of responses from 4,603 participants. Outliers were recorded for some of the questions, but this was to be expected considering the varied background characteristics, such as age and number of days spent in the hospital, which could in uence participants' experiences. Nonetheless, most of these outliers were not deemed extreme based on the 1.5 and 3.0 interquartile ranges. Normality was also ascertained, using the -2 and +2 range (40), for all items of patient experience, except the kurtosis value for 1 item on "nurse services" and 1 item on "doctor services". Overall, the data could be said to be normally distributed to a large extent. The sample for the study was taken from 5 hospitals and characterised by a somewhat fair age distribution of patients across 3 groups: 60 years and below, between 61 and 73 years, and 74 years and above. Most of the respondents were admitted for 3 or fewer days, and more of them were also admitted to the medical department aggregate (Med). Table 2 presents the sample characteristics for this study.

Initial measurement model development, modi cations, and comparisons
The initial CFA model (Model 1 in Table 3), with the 8 dimensions of patient experience, was then developed to be tested. The model showed acceptable tness to the data based on tness indices. Nonetheless, there was a need to improve the tness through modi cations in order to reduce measurement errors and to obtain more accurate loadings of the observed items on their dimensions. Some modi cations were made by drawing covariance between some error terms on the same dimensions with the rationale that, by virtue of sharing commonalities on the dimension, they are more justi ed to share similar error terms, thus reducing duplications of random measurement error of items. In total, 19 modi cations were made based on the covariance coe cients, with the highest coe cient as 895.667 between S2 and S4 ("standard") and the lowest as 40.390 between D4 and D7 ("doctor services"). Aside from the coe cients, these modi cations were theoretically justi ed. For example, the item D2 was worded, "Did you nd that the doctors took care of you?", and D4 was worded as "Did the doctors have time for you when you needed it?" Participants may have given closely related responses due to the phrases "taking care" and "having time when you needed"; therefore, it was no surprise that they shared similar error terms, leading to considerable covariance coe cient. These statistical and theoretical justi cations were made for each covariance drawn. The most modi cations were made to "doctor services" (seven), followed by "standard" ( ve), "nurse services" (four), "information" (two), and "organisation" (one). No modi cations were made to "next of kin", "discharge", or "interaction", owing to very low covariance coe cients (below 20). The initial model with these modi cations (Model 2) thus produced excellent tness values for all indices. Furthermore, the model was compared to 6 other models (explained under Methods section), where the initial model with modi cations showed the best tness to the data. The tness indices of the initial model before and after modi cations, as well as those of the 6 alternative models for comparisons, are presented in Table 3.

Measurement invariance across hospital departments aggregated into two groups
Model 2 was further examined for invariance across 3 categories: con gural, metric, and scalar. Measurement invariance tests seek to ascertain whether the measurement model differs across variant groups in a data. The goal is to achieve little or insigni cant variance across these groups in order to inspire con dence in the ability of the measure to generate accurate responses and assessments across groups (41). Con gural invariance results (see Model 9 in Table 3) showed that the model had acceptable-to-excellent tness to the data, thus ascertaining con gural invariance for the 8-factor patient experience measure across the 2 hospital department aggregates. With regard to metric invariance, the chi-square test showed that the fully constrained model and the unconstrained model were different across the department groups and, thus, not metrically invariant. However, MacKenzie, Podsakoff (42) maintained that "full metric invariance is not necessary for further tests of invariance and substantive analyses to be meaningful, provided that at least one item (other than the one xed at unity to de ne the scale of each latent construct) is metrically invariant" (p. 325). Thus, the critical ratios test was performed to examine whether the dimensions and the items were metrically invariant enough for further meaningful analyses. The analysis revealed that for all dimensions, with the exception of "next of kin", there was at least 1 item that was not statistically signi cant (metrically invariant) besides the item that was constrained for that dimension in the model. This means that the 2 items on the "next of kin" dimension had signi cantly different loadings (parameters) across the aggregated departments. Nonetheless, this test showed the model was metrically invariant across the departments to a large extent. The results of this test are presented as a supplementary table. Scalar invariance was then examined for the model based on the differences in the measurement intercepts. The analyses showed that the model did not have scalar invariance. Differences in intercept estimates of items between the departments were computed, showing that almost all the items did not have scalar invariance across the 2 departments. The results are presented as a supplementary table.

Reliability
Reliability for the measure was ascertained using composite reliability (CR) values. Generally, CR values above 0.70 are deemed acceptable to justify reliability. From Table 4, it is seen that all the dimensions recorded CR values above 0.70, with the highest being "doctor services" (0.92) and the lowest being "interaction" (0.72).

Convergent and discriminant validity
Convergent validity was examined using the AVE values, where an AVE value of at least 0.50 is considered acceptable (38). Table 4 shows that all dimensions, with the exception of "standard", recorded values above 0.50, thus ascertaining convergent validity. Discriminant validity was ascertained using the Fornell-Larcker procedure. There, discriminant validity is supported when the square root of the AVEs is greater than the correlation coe cients between the constructs (39). From Table 4, it is seen that discriminant validity issues were observed for "doctor services" (in relation to "information"); "organisation" (in relation to "doctor services", "nurse services", and "information"); and "standard" (in relation to "organisation"). This means that these 3 dimensions were not distinct from the others enough for each to measure the different sub-concepts under patient experience.
Construct validity, item loadings, and deletion Construct validity for the items was examined by checking item loadings (parameter estimates) on their dimensions. Generally, good loadings were recorded as a majority of the items had loadings above 0.60.
The item loadings ranged from 0.88 (on "discharge") to 0.55 (on "standard"). Two items had loadings below 0.60: 0.58 (ORG 2) and 0.55 (ST 5). Based on the suggestion of the master validity tool (43), these items together with a third (ST4) were deleted in a bid to boost the validity of the measure. Item loadings before and after deletion are presented in Table 5. After deletion, the dimension "standard" recorded an increase in AVE value, indicating that the remaining 4 items explained more variance in the dimension than the original 6 items, seen in Table 6. Figure I presents the model after item deletion as well as validity and reliability checks. See Model 10 in Table 3 for the t indices of this model.

Criterion-related validity
The study then assessed the predictive validity of the model based on its ability to relate to and predict outcome variables ascertained in existing literature. Overall satisfaction, health bene ts, and health level were used as outcome variables while the patient experience measure and its dimensions were used as predicting variables. Patient experience measure and dimensions were computed with retained items after item deletion, and standard multiple regression was performed. The results showed that overall patient experience and each individual dimension related to and predicted at least 1 outcome variable positively and signi cantly. These results are presented in Table 7.
Proposed measurement model A proposed model (see Model 11 in Table 3) was developed, taking into consideration the frequencies of missing values for the items and the discriminant validity concerns. Items with missing values of more than 20% were excluded; therefore, the dimensions of "discharge" and "interaction" were removed from the model. The items on "next of kin" had more than 20% but the dimension was maintained. The questions were the following: "NK1: Were your relatives well received by the hospital staff?" and "NK2: Was it easy for your relatives to get information about you while you were in the hospital?" These questions were maintained because, unlike the other dimensions, relating and answering them depended on factors that are largely beyond the control of the patient, such as whether or not the patient had any relatives alive who visited the hospital and whether the patient stayed in the hospital long enough for relatives to visit the hospital. A second-order factor was added in the proposed model for "nurse services", "doctor services", "information", and "organisation", collectively labelled "treatment services". This was based on the discriminant validity results, correlations among them, and the nature of the questions asked under these dimensions. The 2 lowest loading items (ORG 2 and ST 5) that were previously deleted were still excluded from this model. The proposed model showed excellent tness to the data (similar to Model 10) and also met convergent, discriminant, and criterion-related validity requirements. See Figure II for the proposed model. Table 8 presents comparisons of tools and ndings between the validation study by Pettersen, Veenstra (15) and the current study.

Discussion
This study presents some major ndings. First, the study con rmed that the 8-factor model showed good tness to the data. The model achieved con gural and metric invariance but not scalar invariance. The study also found that reliability values were all acceptable and all the dimensions, except "standard", attained the recommended 0.50 AVE value for convergent validity. With regard to discriminant validity, "doctor services" (in relation to "information"), "organisation" (in relation to "doctor services", "nurse services", and "information") and "standard" (in relation to "organisation") had issues. Construct validity and criterion-related validity were supported for majority of the results. Finally, a model including a second-order factor was proposed. The second-order factor, named "treatment services", consisted of 4 rst-order factors: "nurse services", "doctor services", "information", and "organisation". Moreover, the dimensions of "standard" and "next of kin" were included in this nal model, but "discharge" and "interaction" were excluded. Hence, the nal model included one second-order factor comprising 4 subfactors as well as "standard" and "next of kin".
The dimensions with associated items found in this study were similar to those found by Pettersen, Veenstra (15) while some dimensions, such as "doctor services", "nurse services", "organisation", "information", and "hospital standards", overlapped with dimensions found by other studies (12,13,20). Invariance tests conducted in the present study were absent in the study by Pettersen, Veenstra (15), which marks a good contribution of this study. The tests showed that the model achieved invariance across the aggregated departments with regard to structure and pattern (con gural) as well as the loadings of the items on their respective dimensions (metric). However, scalar invariance was not achieved for this model. Considering the diverse nature of the sample, as well as the aggregation of the departments into broad categories, this nding was expected. Putnick and Bornstein (44) asserted that scalar invariance is the most stringent compared to con gural and metric, and instances of rigid scalar non-invariance could mean that the construct is generally variant across different groups. The ndings also showed that reliability was good, based on composite reliability values, similar to the Cronbach's alpha values obtained by Pettersen, Veenstra (15).
With regard to validity tests, the study found that all the dimensions, except "standard", attained the recommended 0.50 AVE value for convergent validity, similar to other related studies that examined similar dimensions using other instruments (45). However, discriminant validity issues were found for "doctor services" (in relation to "information"), "organisation" (in relation to "doctor services", "nurse services", and "information") and "standard" (in relation to "organisation"). Discriminant validity was also missing in the study by Pettersen,Veenstra (15), thus indicating another good contribution of this study. Examining the wordings of their items gives some possible explanation for this nding. For instance, D1 under "doctor services" was worded as "Did the doctors talk to you so you understood them?", while questions under "information" included "IF2. Did you know what you thought was necessary about the results of tests and examinations?" and "IF3. Did you receive su cient information about your diagnosis or your complaints?" It is highly likely that patients will receive information on results and diagnosis mainly from their doctors and, as such, answering questions under "information" may be signi cantly in uenced by the perception of how well the doctors spoke to these patients. Similarly, questions under "organisation" were "OR1. Did you nd that there was a permanent group of nursing staff that took care of you?", "OR2. Did you nd that one doctor had the main responsibility for you?", "OR3. Did you nd that the hospital's work was well organized?", and "OR4. Did you nd that important information about you had come to the right person?" These questions feature clear wording relating to "nurse services", "doctor services", "information", and "standard", and it is therefore not surprising that no clear distinctions were found among them as constructs. Construct validity was also achieved with a majority of the items recording loadings of above 0.60. This was also achieved in the validation study by Pettersen, Veenstra (15) using a different method and in related studies using other instruments with similar dimensions (12,13). One item on "standard" and one on "organisation" were, however, deleted due to loadings below 0.60, while another on "standard" was deleted in a bid to improve the discriminant validity. Perhaps the wording of these questions made them di cult for patients to understand clearly and respond accordingly. For instance, item S5 was framed as "Was the food satisfactory?" Patients may be left to decide what is meant by 'satisfactory', thus making the question too vague, or perhaps the different dietary requirements and preferences made this question more loosely de ned. Again, item OR2 was framed as "Did you nd that one doctor had the main responsibility for you?", a question probably dependent on the ailments of the patient and likely to be out of the control of hospital organisation. Thus, if a patient's ailments require more than a single main doctor, then this question may suggest to the patient that having 2 or more main doctors reduces the ability of the hospitals to organise their work well.
Criterion-related validity was ascertained for the overall measure as well as the dimensions in predicting at least 1 of the 3 outcome variables: satisfaction, health bene ts, and health level, which is consistent with previous studies (e.g. 33,34,35). Lastly, a model with a second-order factor, "treatment services", for 4 of the dimensions was proposed based on the results of the validity and reliability analyses: "nurse services", "doctor services", "information", and "organisation". This constitutes the most important contribution of this study since this possibility was not explored in the study by Pettersen,Veenstra (15), perhaps owing to the absence of discriminant validity examinations in their study, and since this indicates a change in the factor structure of the PEQ. Rindskopf and Rose (46) observed that second-order factors re ect relationships among rstorder factors. It is worth noting that related studies that developed other PREMs for generic and speci c health issues also found these 4 dimensions in common (e.g. 11,13,20,47). Although these studies did not develop a second-order factor for these dimensions, this is indicative of the prominence of these 4 variables in measuring and understanding patient experiences. The current nding, therefore, builds on this prominence to illustrate the high interrelationships and inextricable links among these factors, which brings some theoretical and practical implications to the fore.

Theoretical Implications
This study brings a very important, yet mostly ignored, contribution to the patient experience and quality healthcare literature: a need for more validation studies and surveys on patient experiences. The study responds to the recommendation by Pettersen, Veenstra (15) that existing PREMs need scrutiny and also tackles the research gap identi ed in the matrix by Beattie, Murphy (16), indicating that the PEQ by Pettersen, Veenstra (15) lacked some validity analyses. This buttresses the claim that, indeed, changing theoretical underpinnings in uences existing measures and changing statistical methods and tools can reveal weaknesses of measures; moreover, this should be countered by regular psychometric appraisals of these measures. The results also contribute to the views of some researchers (e.g. 17,18), regarding the need to repeat patient experience surveys to generate more reliable data for policy-making. The assessment of patients' perspectives of hospital care would have to be reliable and valid enough in order to elicit accurate information about their experiences, constructs, and outcomes. Thus, it is imperative to ensure that these instruments always perform optimally and generate reliable information on how to improve quality of care and hospital experiences. These results, therefore, provide a background for further studies to be conducted on PREMs.
Another major contribution of this study is the nding of a second-order factor labelled "treatment services", which consists of 4 factors: "nurse services", "doctor services", "information", and "organisation". This means that there exist strong and signi cant relationships among these dimensions (46). This nding also means that a single dimension or factor could adequately account for all 4 dimensions and could be identi ed as a major sub-dimension that captures these 4 dimensions. The "treatment services" factor has implications for the conceptualisation of patient-oriented hospital service climates. Patients in these hospitals may have highly overlapping experiences across "nurse services", "doctor services", "organisation", and "information". In more speci c terms, it can be said that these patients experience a main dimension that accounts for signi cant portions of the 4 dimensions, perhaps because of the way these factors play out in the hospitals. For instance, doctors provide information regarding patients' health, ailments, and treatments while nurses organise and assist patients with the treatment process. This is signi cant in advancing knowledge of patient experiences. The experience of these 4 dimensions may not be that distinct, and patients, in experiencing service climate in the hospitals, may not adequately distinguish their shared perceptions of "doctor services" from "information" or of "nurse services" from "organisation", for instance. Patient-oriented hospital service climate, as conceptualised in this study, may include 2 levels of factors that in uence patients' perceptions and experiences of healthcare. Based on this, the de nition of patient-oriented hospital service climate is modi ed as: patients' shared perceptions and experiences across different levels of hospital work environments, the services, and formal and informal practices and procedures that make up the entire caregiving process and that inform patient outcomes. This contribution is also a major highlight when compared to the study by Pettersen,Veenstra (15), in which discriminant validity was not examined and a resulting second-order factor analysis was not explored. This challenges the theoretical structure of the PEQ and theoretical distinctness among these factors. Therefore, this study suggests a change in the factor structure of the PEQ and the development of a second-order factor for these 4 dimensions in the general patient experience literature. These possibilities are worth exploring in further surveys and studies on hospital factors as patient experiences during the caregiving process.

Practical Implications
Quality healthcare delivery is not exclusive to a region or country but a general goal of all healthcare systems worldwide. This can be contributed to by generating accurate information on how healthcare users experience healthcare systems. The results from this study suggest that it is not enough to develop a good measure of patient experiences, but it is imperative to review and reassess the ability of the measure to keep generating accurate information on patients' experiences and health. The questions in the PEQ may have to be revised in order to elicit more concise and accurate information from patients.
Furthermore, some dimensions, such as "next of kin", seemed not to be relatable to most of the patients, judging from the many missing values and invariance tests. Also, the PEQ should be administered with the second-order factor taken into consideration. It is imperative to analyse "nurse services", "doctor services", "information", and "organisation" as a second-order factor, as shown in the proposed model, due to the validity issues that were realised in the analysis. This can provide researchers and management with adequate knowledge on what patients experience during the caregiving process. Moreover, management must take the interrelationships in the second-order factor into account to make meaningful, informed, and sustainable changes in the hospitals for patients. The second-order factor must be considered as a single climate encompassing these 4 dimensions, where patients' perceptions and interactions with a factor has a ripple effect on the others. Such considerations in policies and practice can help management and workers to reduce errors that may have dire consequences.

Limitations and directions for future research
This study employs data that is not at the national level but from a health region in Norway. That notwithstanding, the study has good generalisability power owing to the similarity in hospital and healthcare systems across the regions in Norway. Generalising to other countries, however, is di cult due to the differences in culture and healthcare systems. The ndings require additional research in different countries for further justi cation. Therefore, future studies on reassessing psychometric properties of PREMs may want to employ larger data sets, for instance at the national level or across regions, to further investigate and develop the measurement quality of such surveys. Furthermore, future research should adopt the proposed model (with the second-order factor) from this study and examine it empirically to con rm it or otherwise, within health sectors across different countries.

Conclusion
Hospital management should know and consider the views and experiences of the people they care for if their services are to be in uential in improving patients' health. The results of this study show that changes in psychometric analytical tools and methods can indeed highlight possible weaknesses and inadequacies in measures, as seen with the PEQ. This is evident in analyses such as invariance, discriminant validity, and second-order factors conducted in the current study but absent in the earlier study. Therefore, repeated surveys are needed to improve the measures' performance. The results also indicate possible changes with regard to dimensionality of PREMs, owing to the second-order factor nding. This calls for adequate attention, from researchers and hospital management alike, to the interrelationships among some of the dimensions, as this has important implications for theory and practice in healthcare. Management should consider these relationships in making decisions concerning the quality of care for patients, while researchers should delve more into studies that ascertain the psychometrics and dimensionality of PREMs. This study, with regard to data collection, analysis, and compilation, was conducted within the ethical and legal provisions and guidelines of the Norwegian Institute for Public Health (NIPH) and the University of Stavanger. The Norwegian Data Protection Authority and the Norwegian Directorate of Health approved the procedures in the survey. The hospital data protection o cial assessed the data processing in the hospitals where survey extension took place. Informed consent was obtained from participants in the survey. Respondents were informed that participation was voluntary and they were assured of con dentiality of the information they will provide. Respondents were also informed that they could opt out of the survey at any point as well as the procedure for opting out if they wished. Data was stored in a safe repository with a password, only accessed by the researchers. This study did not present results that revealed patients' identities, thus maintaining anonymity of respondents and con dentiality of responses. All relevant ethical requirements were duly upheld.

Not applicable
Availability of data and materials The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. Tables   Table 1 Fitness indices and acceptable   Note: *These models were assessed with the modification estimates.1 st -nurse and doctor into one factor; 2 nd -nurse, doctor and organisation into one factor; 3 rd -nurse and organisation into one factor; doctor and information into one factor; 4 th -nurse, doctor, organisation and information into one factor; next of kin and standard into one factor; discharge and interaction into one factor; 5 thnurse, doctor, organisation, information, next of kin and standard into one factor; discharge and interaction into one factor; 6 th -all dimensions into one factor.  Note: Items marked with * had the lowest loadings