Determination of Turkish Norms of Psychometric Tests For Diagnosing Minimal Hepatic Encephalopathy and Proposal of a Simplied Paper Pencil Test Battery

Background: Psychometric hepatic encephalopathy score (PHES) needs standardization for countries and is used for minimal hepatic encephalopathy (mHE) diagnosis. Assessing it is cumbersome. Aims: To standardize PHES for Turkish patients and compare it with German norms; to determine mHE prevalence with two different methods and to assess whether the PHES test can be simplied. Methods: Healthy volunteers (n=816; 400 male) and cirrhotics (n=124; 58 male) were included. PHES score threshold was set at ≤ -5 points and that of critical icker frequency (CFF) at <39 Hz for mHE diagnosis. For comparing German and Turkish norms, datasets were combined. Multiple backward procedure was applied. Receiver operating characteristic (ROC) curves were created for assessing diagnostic capabilities of single subtests of the PHES test. Results: Turkish norms displayed non-linear dependence on age and education. Rate of mHE in compensated cirrhotics was 29.8% and 27.4% with PHES and CFF tests, respectively. Compatibility of two tests was low (kappa coecient 0.389); mHE prevalence decreased to 16% when both tests were combined. Turks performed worse vs Germans in the digit symbol (DS) and serial dotting (SD) subtests but generally performed better in the other subtests. In ROC analyzes of subtests, AUROC was 0.974 for DS+SD tests combined. Conclusions: PHES norms for Turks were developed. Use of 2 methods for diagnosing mHE may be important for research purposes. From a clinical perspective sensitivity with acceptable specicity may suce. Combined use of DS and SD subtests of the PHES battery may well be suited for use in clinical practice.


Introduction
Hepatic encephalopathy (HE) is a brain dysfunction caused by liver insu ciency and/or portosystemic shunting. It manifests as a wide spectrum of neurological or psychiatric abnormalities ranging from subclinical alterations to coma [1]. Minimal HE (mHE) represents the mildest form of HE which is not recognizable by a standard neurological examination. However, de cits in attention, working memory, psychomotor speed or visuospatial ability are detected by means of psychometric tests. Furthermore, electrophysiological tests such as EEG and evoked potentials or psychophysiological tests such as critical icker frequency (CFF) may show alterations of brain function in these patients [2,3].
Despite its concealed or covert clinical characteristics, several reports point to the clinical signi cance of mHE: it impairs quality of life [4][5][6]; affects tness to drive [7][8][9], predicts overt HE [10,11] and mortality from liver disease [12,13]. Further, the working capacity of patients with mHE may be reduced as outlined by early retirement of blue-collar workers with mHE [6].
The prevalence of mHE varies from 20% to 84% in different studies [1]. This high range of variation between studies depends mainly on two factors: the different methods employed for mHE diagnosis and the characteristics of the population studied [14,15]. Among the latter, variables including cultural background, age and education are known confounding factors affecting test performance. Further, alcohol abuse and other accompanying diseases need to be considered as important confounding factors [15,16].
Methods used for diagnosing mHE can be grouped into psychometric and neurophysiological tests. The most commonly used psychometric test is the PSE (Portosystemic Encephalopathy)-Syndrome test which yields the psychometric hepatic encephalopathy score (PHES) [17]. Other tests, that have been frequently used are the continuous reaction time test (CRT) [18], inhibition control test (ICT) [19], Stroop [20] and SCAN tests [12].
Neurophysiological measures include the EEG and evoked potentials. Another quite often used method is the critical icker frequency (CFF) test [21]. Each method has its advantages and disadvantages, and the combination of at least two testing strategies is recommended especially for multi-centered studies, provided that one method is a psychometric test [22].
Of the many methods available the PHES test represents the most often used test for the diagnosis of mHE. The test has been thoroughly validated. Hamster and Schomerus [23], who developed the test battery, had assessed numerous psychometric tests for their use in diagnosing HE, before they nally created the PHES-test battery with the following paper-pencil tests: Digit Symbol Test (DST), Number Connection Test-A (NCT-A), Number Connection Test-B (NCT-B), Serial Dotting Test (SDT), and Line Tracing Test (LTT). This test battery has been considered to be the gold standard diagnostic method for the diagnosis of mHE by experts in the eld [24]. One concern with this test battery was the learning effect when repeated. Therefore, four parallel versions of the test have been elaborated and validated for follow-up examinations.
As a neuro-or psychophysiological test, the CFF test seems attractive since it is easy to perform and may have a pathophysiological link to HE as well. It was reported that in patients with liver cirrhosis retinal glial cells present pathological changes, named 'retinal gliopathy', reminiscent of type II astrocytosis in the central nervous system [30].
In the present study, we primarily aim to establish Turkish norms for the PHES test battery, and then determine the prevalence of mHE in patients with compensated cirrhosis by using two different methods for the detection of mHE, namely the PHES test battery and the CFF test as recommended in the recent EASL-AASLD guidelines [33]. In addition, using receiver operating characteristic curve analyses we explored whether the current 5 test assessment for diagnosing mHE can be replaced by a simpli ed test battery.

Selection of Patients and Healthy Subjects
This study was conducted on a total of 816 healthy subjects distributed evenly according to age, sex, and educational background characteristics and 124 patients with a diagnosis of liver cirrhosis at the Gastroenterology Department of Ankara University Medical School from 01/01/2015 to 7/31/2015. Figure 1 shows the ow diagram of the study.
The study consisted of two parts; in the rst part normal values of the PHES tests in the Turkish population based on age and educational level was determined, and compared to German norms [17]. In the second part, prevalence of mHE in compensated cirrhotic patients was explored. For the rst study, healthy subjects, 18 to 70 years old, were enrolled (Suppl. Table 1). Subjects who had consumed alcohol or had used psychoactive drugs within the last 3 months were excluded. Further exclusion criteria were visual or hearing impairment, neurological diseases and illiteracy. For the cirrhotic patients, besides the listed exclusion criteria for healthy subjects, decompensated cirrhosis, a history of HE, esophageal variceal bleeding, hepatocellular carcinoma and any signi cant concomitant disease which might have affected the results of the study such as neuropsychiatric disease, respiratory, kidney or heart failure, or any malignant disease were exclusion criteria as well. The study was conducted in line with the Declaration of Helsinki. The study was approved by the University of Ankara Medical School Ethics Committee. All patients gave written informed consent.

Neuropsychometric Tests
The PHES test battery applied to patients and healthy subjects consisted of the DST, NCT-A, NCT-B, SDT and LTT. LTT was assessed separately based on time and the number of errors made.
PHES test score calculation was made according to the study of Weissenborn et al [17]. Thus, for each test the score was depicted as 0 when the test value was between -1 SD to +1 SD; the score was -1 for values between -1 SD to -2 SD; it was -2 for values between -2 SD to -3 SD, -3 for values below -3 SD, and the score was depicted as +1 when the values exceeded +1 SD. Thus, the total score interval ranged between -18 and +6. Scores equal to or lower than -5 were considered indicative of mHE based on the results in the norm population. The cut-off is the same as in the study by Weissenborn et al [17].
Critical icker frequency CFF measures the threshold at which fused light impression switched to ickering light impression perceived by the patient when frequency of light pulses from a light source decreased stepwise by 0.1 Hz from 60 Hz downwards. A total of nine measurements were recorded and the mean of those 9 measurements was calculated. CFF <39 Hz was taken to be the threshold for mHE, in line with previous studies [34]. CFF and the PHES test battery were administered at the same session.

Statistical Analysis
Data were evaluated with SPSS 15.0 statistical package program. Descriptive statistics were presented as mean (±) standard deviation, or median and percentiles. Statistical methods included the Chi-Square test and Fisher's test, for categorical variables and Student's t and Mann-Whitney U tests for continuous variables. Correlation between continuous variables was evaluated with Spearmen's correlation analysis for non-normal distribution and with Pearson's correlation analysis for normal distribution. P<0.05 was considered signi cant.
For determination of the Turkish norms for the PHES test, a transformation (log for NCT-A, log-log for NCT-B, log-loglog for SDT, square root for DST, log-log for LTT-time, and cube root for LTT-error) was performed for each sub-test as the data presented by healthy individuals did not conform to a normal distribution, assessed with the Kolmogorov-Smirnov test. In the transformed scales, norm limits were de ned as the values of age dependent mean and of deviations of k = -1, +1, +2, +3 (NCTA, NCTB, LTT Time, SDT, LTT Error) or k = +1, -1, -2, -3 (DST) standard deviations from the mean value. Thus, deviances indicating worse performance of a subtest may be classi ed into three categories, while better performances form a single category. Retransformation for the log-transformation is based on the formula: exp (a + b*age ± k * s).
For the iterated logarithm (log-log), the retransformation formula is: exp exp (a + b*age ± k * s) and for the threefold logarithm (log-log-log): exp {exp exp (a + b*age ± k * s)}.
For the square root transformation it is: (a + b*age ± k * s) 2 and (a + b*age ± k * s) 3 for the cube root transformation.
Here, a denotes the constant (intercept) of the regression line in the transformed scale, b the slope of the line and s the standard deviation of the residuals. k is the integer value to be replaced by the de ned number of standard deviations for the different norm limits.
The impact of covariates on the test results was analyzed as follows: for each subtest, age was always included into the regression model, applied to the transformed subtest results as described above. The two further covariates (education and gender) were included and handled in a stepwise procedure: after inclusion of both of them, at each step, the covariate with the highest p-value was excluded from the model as long as this p-value was > 0.05. Formal Education was handled as a categorical variable with contrast of "H" (for high school) and of "U" (university) vs. "P" (primary school). For each subtest, additional outliers were excluded from the model. The nal regression model then contained the age together with the covariates relevant for this subtest. As a consequence, the norm values were recomputed for each subtest taking into account the relevant covariate values of each subject.
The homogeneity of variances among different groups was analyzed with the Levene test to ensure homogeneity. Individuals falling outside of ±3 standard deviation were excluded from the analysis. Distribution of test data was assessed according to educational background and age with a two-way variance analysis (two-way Anova test).
In analyzing German [17] and Turkish norms, in order to compare the regression characteristics for subtests with equal transformation, the datasets were combined, and a multiple backward procedure was applied allowing for different intercepts and slopes in the Turkish and the German populations. The differences were tested within the framework of the linear model and were eliminated from the model if p > 0.05. The standard deviations of the residuals were compared by applying Levene's test of equal variances. The test results are given in the last column of table 1.
Finally, receiver operating characteristic (ROC) curves were created for assessing the diagnostic capabilities of single subtests of the PHES test. Then, ROC curves for combination of two single subtests were established. The area under ROCs (AUROC) of single or double subtests were compared using Hanley-McNeil tests [35]. A p-value of <0.05 was considered signi cant.

Results
The study included 816 healthy controls and 124 patients with compensated cirrhosis. Healthy subjects had a mean age of 40.5 ± 14 (mean age ± SD) and 400 (49%) were male While 30% (246) of healthy subjects were primary school graduates, 33% (271) were middle or high school graduates and 37% (299) were university and above graduates. The distribution of healthy subjects according to age and educational level is depicted in Suppl. Tables  Analysis of test results displayed a heterogenous distribution. A normal distribution was achieved after logarithmic transformation.
As shown in gure 2, a non-linear dependence on age by linear regression analysis was found with all paper pencil tests assessed. An in uence of formal education is present in all six subtests (with advantages of "H" and "U" vs. "P" as given by the two regression coe cients) (Fig. 3). Gender attributes to the norm values only in the Serial Dotting subtest, with better female results. Based on these data Turkish norms of the PHES test battery were created taking into account age and level of education. In Suppl. Table 3, normal values with standard deviations are presented according to age and education level for every paper pencil test.
The distribution of the PSE sum scores (mean value: -0.121; standard deviation s = 2.035) in the Turkish population (after the exclusion of outliers in the subtests) is shown in Suppl. Table 4. The cut-off point for out-of-norm negative (pathological) values was set at <= -5 points; 19 of 785 cases (2.4%) fell out of the normal range.

Comparison of German and Turkish Norms
The derivation of the German (Hannover) norm values was based on N=249 for the NCTA and NCTB and on N = 120 healthy subjects for the other four subtests [17].
As shown in table 1, the transformations used to obtain linear dependency between age and test results and homoscedasticity of the residuals were the same in the Turkish and the German population only for the subtest NCT-A. The discrepancies in the other subtests were mainly due to the nding that in the Turkish population the standard deviations of the residuals were still increasing with age after simple (log-or square-) transformation.
In all ve subtests the age-adjusted values differed signi cantly in the two populations (p < 0.001 to p = 0.006; see table 1). For NCTB, LTT time and LTT Error, higher subset score values were seen in the Turkish compared to the German population (i.e.: Turkish subjects generally needed less time for the Number Connection Test B and for the Line Tracing Test, and made less errors in the Line Tracing Test test). However, for the DS and SDT tests, lower subset scores were observed for the Turkish compared to the German population (i.e. generally lower numbers were achieved in the Digit Symbol test and more time was needed for the Serial Dotting Test).
Comparison of the PSE sum scores in the German and Turkish healthy populations.
In order to conduct an overall comparison of the German and the Turkish norm population data, the German norms and the Turkish norms were applied to both populations. Thus, a comparison between the populations and between the norm values was possible on the basis of each of the two norm de nitions.
On the basis of the Hannover norm de nition, the mean Sum Score was signi cantly lower (that is: worse) in the Turkish population (-1.22 vs. 0.025; unpaired t-test: p < 0.001). Also, the standard deviation was signi cantly higher (2.987 vs. 1.994; p < 0.001; Levene test of equal variances, Suppl. Table 5).
If the Turkish norm is applied, no signi cant differences are found between the populations. In the Turkish population, 19 out-of-norm cases would also be classi ed as out of norm by the Hannover norm de nition, but 142 of 785 subjects classi ed as normal would be classi ed as out of norm if the Hannover Scores were applied (Suppl. Fig 1). This is indicating that the Turkish norm limits are generally wider. The difference is signi cant (p <0.001; test of McNemar). It must be the effect of education which leads to the wider norm limits in the Turkish compared to the German population. It suggests that education standards are more homogenous in Germany and more heterogenous in Turkey. Difference of the norm de nitions is mainly apparent for negative Sum Score results (Suppl. Fig 1).

Minimal HE prevalence in patients with compensated cirrhosis
Once PHES test cut-offs within the Turkish population was established, the study proceeded to the second stage i.e. estimation of mHE prevalence in patients with compensated cirrhosis. To this end, two different methods as recommended by the latest EASL-AASLD guidelines were used: the PHES test battery for neuropsychometric assessment, and the CFF test as a neurophysiological method [1].
Patients with cirrhosis were older and less well educated compared to healthy controls (Suppl. Table 2). Within the compensated cirrhotic patient group, the most frequent observed etiology of liver disease was chronic hepatitis B (21.8%), followed by cryptogenic cirrhosis (11.3%), and chronic hepatitis C (11.3%) (Suppl. Table 6). Majority of the so called cryptogenic cirrhotic cases are most likely due to non-alcoholic liver disease. Suppl. When patients who were outside the norms with both tests were considered as diagnostic of mHE, patients with mHE would be 16% (20/124). We assessed the compatibility of CFF test with the PHES test battery using the Cohen's kappa coe cient of concordance. Kappa coe cient of concordance was 0.389 (p<0.001). However, this was still a weak concordance due to a low coe cient of concordance.

Assessment of PHES subtests to replace the whole PHES test battery:
We performed receiver operator characteristic (ROC) analyses for PHES subtests for predicting mHE by comparing them with the PHES total score. Of the PHES subtests, the best area under ROC (AUROC) curves were achieved with the DST, NCT-A and SDT ( Table 2). The DST performed better than NCT-B, LTT time and LTT error (p<0.05 for all by the Hanley McNeil test, Table 3). In comparison with the PHES test, area under ROC curve (AUROC) for the DST test was 0.925. With a cut-off point of -0.5, the sensitivity was 94.6% and the speci city 79.3% for predicting mHE based on the PHES total score (Cohen's kappa= 0.657). AUROC was 0.974 when DST and SDT were combined and this 2 subtest combination had the best AUROC compared to other binary tests (Table 3). Further, the triple test combination with the highest AUROC (DST+SDT+NCT-A) did not do better than this dual test combination in diagnosing mHE (Table 3). For this pair, at a cut-off of -1.5, sensitivity was 97.3% and speci city 86.2% for diagnosing mHE (Cohen's kappa = 0.769) ( Table 2). Thus, out of 37 patients diagnosed with mHE according to the PHES test battery, 36 had been caught with this binary test combination. However, out of 87 patients without a diagnosis of mHE with the PHES test battery, 12 patients would have been misdiagnosed with mHE. For a cut-off of -2.5, sensitivity was 78.4% and the speci city 97.7% (Cohens Kappa = 0.798). When checking other PHES subtests and their combinations, the use of this pair had the closest result to the PHES total score. We performed ROC analysis similarly for the CFF test. The AUROC curve for the CFF test to predict the PHES total score based mHE diagnosis was 0.747. When the cut-off was set at 39.0 hz., the sensitivity and speci city were 54.0% and 83.9%, respectively.
Comparison of demographic, biochemical and non-invasive brosis markers in patients with or without mHE: As shown in Suppl table 8 patients with mHE according to the PHES compared to those without had higher MELD and Child-Pugh scores. They had more advanced Child-Pugh class, and higher FIB-4 scores). Blood ammonia levels and total bilirubin levels were higher and albumin was lower). CHILD score and class as well as albumin level were also associated with mHE diagnosis with the CFF test, but the most signi cant effect on CFF had age (Suppl. Table 9).

Discussion
The present study had three aims. The rst and main aim was the establishment of the Turkish normal values for the Porto-Systemic Encephalopathy (PSE)-Test that provides the psychometric hepatic encephalopathy score (PHES) and compare it with German norms. The present study has standardized the PHES test battery for Turkish people, according to age and educational background. We have provided as Suppl. material norm values for every test according to age and level of education. The second aim of the study was the assessment of mHE prevalence in cirrhotic patients according to the newly established cut-offs of the PHES test and the CFF test. The third aim was to assess if the PHES test could be replaced with a simpler version. Using receiver operating characteristic curve analyses we were able to provide data to show that two subtests of the PHES test battery, namely the DST and SDT can be used for a rapid screening for mHE.
Our data con rm that the PHES test battery score varied depending on age and educational background of subjects. Test performance decreased with advanced age. These ndings are in line with the data reported for Germany, Italy, Spain, Poland, South Korea and Mexico [17,[25][26][27][28][29]. In addition, test performance also decreased with lower educational background in line with data from Poland, Spain, Italy, South Korea and Mexico but in variance with data obtained from Germany in 1999 [36] (Suppl. Table 10). The results obtained from healthy volunteers were compared in detail with German data and there was a signi cant difference between the groups for all sub-tests (p<0.001-p=0.006) [17]. Healthy Turks compared to healthy Germans needed less time for the NCT B and for the LTT, and made less errors in the LTT. However, Turks compared to Germans achieved lower numbers in the DST and they needed more time to complete the SDT. NCT A performance was similar between healthy Germans and Turks. Looking at the PSE Sum Scores PHES, the mean Sum Scores were signi cantly lower (i.e. worse) in the Turkish population. This is interesting as out of the 6 parameters assessed in the PHES test battery, in three (NCT-B, LTT time and LTT error) Turks did actually perform better than Germans and in only two (DST and SDT) did Germans perform better than Turks. It may suggest that the differential power of the DST and SDT in the overall PHES test battery exceeds the other tests. This is supported by our nding that DST and SDT combined displayed a very high AUROC (0.974) for predicting mHE when compared to the PHES total score. When the DST was used alone it displayed again a quite high AUROC of 0.925 for predicting mHE. These data suggest that using DST and SDT instead of the whole battery can be considered, as one of the perceived disadvantages of the PHES test battery is the time needed to complete the full battery of tests (30 minutes on average). A simpler psychometric test would certainly be an important adjunct for the assessment of mHE. Several attempts were made in this regard [37,38]. Our observation related to the DST and SDT is actually in line with the study by Riggio et al who used a triple test battery consisting of the DST, SDT and LT tests and suggested that this triple test was as 'good' as the PHES test [37]. This triple test reached an AUROC of 0.981 in the current study and our data suggest that the binary test serves the purpose of diagnosing mHE good enough. It must be considered, however, that depending on the cut-off used for a sum score given by these two sub- For the assessment of mHE the cutoff value for the PHES was set at ≤-5 in line with studies conducted in Germany, Spain, Poland, South Korea and Mexico [17,[26][27][28][29]. The only exception is Amodio's study from Italy in which the cutoff value was set at ≤-4 [25]. The mean overall score of the cirrhotic patient group was -3.54 ± 3.98 in our cohort.
This score was similar to German (-3.5 ± 4.7), but lower than in Spanish (-1.4 ± 3.4) and Italian (-1.5 ± 3.7) studies. In a study from Italy the proportion of patients with only primary school education was 35% compared to the 57% observed in our study [25]. Age and education should not affect the test results in our patients since the norms are adjusted for age and education. However, it has been shown before that patients with less educational training and patients of older age are more frequently diagnosed with mHE even using adjusted norms. This nding has been interpreted as result of a lower cognitive reserve in these subjects.
MHE prevalence of cirrhotic patients was 29.8% according to the PHES test battery. This rate changes between 20% and 84% in different studies [1]. The results from Spain, South Korea and Portugal are similar to our study (30.7%, 25,6% and 34%, respectively) [26,28,39]. The differences may be due to the variability of the studied patient cohorts.
Supporting this, a rather high mHE prevalence according to PHES test (48%) was reported by Dhiman et al in a patient cohort that included Child-Pugh class C patients [32].
We also used the CFF test for diagnosing mHE in cirrhotic patients. The CFF test has been reported to be able to re ect both retinal functioning and cerebral cortex functions. CFF assessment appears to be a reliable, simple, convenient test and test performance is reported not to be affected by variables such as age or educational background [30]. The latter is supported by results of the current study which has used a large cohort of healthy subjects with different educational background, and this property of the CFF test provides an important advantage for its use for mHE diagnosis. However, age signi cantly affected CFF test results in the current study in line with several other studies [21,31,32]. Nevertheless, we used the xed cut-off (39 Hz) suggested by Kircheis et al to evaluate the CFF results to be able to compare our results with previous studies [21].
The present study found the mHE rate to be 27% with the CFF test. Similarly mHE was 21% in the study by Dhiman et al [32]. However, a study conducted in Spain reported a much higher rate (42%) which is likely due to a sicker patient cohort which included Child-Pugh class C patients (about 18%) as well [30]. In addition to that, in the Spanish study about 42% of the patients had alcoholic liver cirrhosis, and alcoholism is known to affect CFF test results [21,30].
An important question is the concordance between PHES test battery and CFF tests. The Cohen's kappa coe cient of concordance was 0.389 consistent with a signi cant albeit poor concordance. Several reasons may account for this such as a lower sensitivity of the CFF test or focus on a different characteristic of HE than the PHES test. Thus, any characteristic considered to indicate HE with one method might not be captured by the other method. Out of 37 patients diagnosed as having mHE according to the PHES test battery, 17 patients were not detected by the CFF test.
This difference is high and suggests that the CFF test fails to detect mHE with su ciently high precision as has been recently reported in a meta-analysis [40].
In conclusion, the present study provides Turkish norms for the PHES test battery. The PHES test results varied, in line with previous data according to age and educational level. The EASL-AASLD guidelines recommend the use of two different methods for detection of mHE. This was done in the current study. Abnormal results were found in 29.8% using the PHES test, in 27.4% using the CFF test but in only 16% with both methods. The use of 2 methods for diagnosing mHE increases speci city and may be important for research purposes. From a clinical perspective however, it makes sense to diagnose mHE early, treat it early as mHE is not associated with a good prognosis. Through its treatment we may increase patients' quality of life, prevent early loss of workforce, as well as likely accidents and deaths. In this context for the diagnosis of mHE, for the practicing clinician sensitivity matters more than speci city. The use of a simple paper pencil test can serve that purpose and based on the data presented here we think that the combined use of the DST and SDT is well suited for this.

Declarations
Con ict of interest: The authors declare no competing interests.
Declaration of funding interests: The Critical Flicker Frequency device was kindly provided by ASSOS Pharmaceuticals throughout the study. We did not receive any nancial support for conducting the study.