Demographic data and clinical features
A total of 249 subjects with a mean age of 42.85 ± 0.77 years participated in the research. Of these, 45 (18.1%) were female, and 204 (81.9%) were male. The mean age of the females was 46.02 ± 2.02 years while that of the males were 42.15 ± 0.82 years. The difference between male and female was not significant. The clinical manifestations observed in the patients were as follows: fatigue 45 (18.1%), dizziness 24 (9.6%), arthralgia 22 (8.8%), abdominal discomfort 21 (8.4%), fever 20 (8%), decreased appetite 19 (7.6%), icterus 17 (6.8%), dark urine 16 (6.4%), nausea 15 (6%), clay-colored stools 11 (4.4%), vomiting 10 (4%) and diarrhea 8 (3.2%).
The mean HCV viral load was 5,198,244 ± 137,139.3 copies/mL (Min=100 copies/mL, Max= 29,257×104 copies/mL). The log HCV RNA in the patients was 5.44 ± 0.08 copies/mL: 5.18 ± 0.19 copies/mL in females and 5.5 ± 0.09 copies/mL in males. The mean serum levels of ALT, AST, and ALP were 90.41 ± 7.25, 77.35 ± 4.7, and 228.69 ± 12.86, respectively. In control group, the mean serum levels of ALP, ALT, and AST were 305.46±7.55 (IU/L), 27.86±1.23 (IU/L), and 33.22±6.31 (IU/L), respectively.
Genotype 3 was the most prevalent genotype followed by genotype 1; 116 (46.6%) and 106 (42.6%), respectively. The genotype was undetectable in 11 patients (4.4%). A total of 13 individuals (5.2%) had genotype 2, and only three individuals had a mixed genotype (1.2%). The most risk factor for infection, were imprisonment, having a positive family history, sexual intercourse, bloodletting and tattoo. The demographic data and clinical features of the patients according to different genotype groups of HCV infection are shown in Table 1.
HCV viral load associations
Log of HCV RNA did not differ between males and females (5.19±0.19 copies/mL versus 5.5±009 copies/mL, respectively, p=0.1). The log of viral load in patients with hepatitis C decreased with age; however, this finding was not statistically significant (p=0.1). Moreover, there was a statistically significant association between log HCV RNA and genotype group (p=0.04). Log of HCV RNA was significantly different between the genotype 1 group and the undetectable genotype group and also between the mixed and undetectable genotype groups. Log of HCV viral load increased significantly with increasing in ALT serum level and decreasing in platelet count, (r=0.2, p=0.02, r=-0.25, p=0.03, respectively). However, there was no significant association between log of HCV viral load and AST, ALP and Alb serum levels. Among the clinical manifestations, log HCV RNA was significantly higher in patients who had arthralgia, fatigue, fever, vomiting or dizziness (p=0.02, p= 0.0001, p=0.009, p=0.04, p=0.02, respectively).
HCV genotypes associations
No significant association was found between HCV genotypes and gender, age, serum levels of AST, ALT, Alb, and ALP, and platelet count. However, 50% of the subjects who had dark urine and 35.3% of those with icterus were of genotype 3 (p=0.008 and p=0.006, respectively). Table 2 shows the association between genotypes.
Diagnostic values of liver biomarkers
To determine the optimal cutoff points of the liver biomarkers for detection of HCV-infected patients, ROC curves were calculated, and specificity plus sensitivity were maximized. As shown in Fig 1a, the ROC curve analysis revealed that the AST cutoff of >31 IU/L as a surrogate marker for the detection of HCV infection had a sensitivity of 87.7%, a specificity of 84.36%, a positive predictive value (PPV) of 44.6%, and a negative predictive value (NPV) of 98%. The prognostic value of ALT for the detection of HCV infection was >34 IU/L with a sensitivity, specificity, PPV, and NPV of 83.51%, 81.11%, 36%, and 97.5%, respectively (Fig 1b). The sensitivity, specificity, PPV, and NPV of ALP for HCV infection were 72.06%, 42.81%, 8.3%, and 95.5% (Fig 1c). The best cutoff point was ≤246 IU/L. The area under the curve (AUC) values for AST, ALT, and ALP were 0.876, 0.812, and 0.59, respectively. All the values were statistically significant (p≤0.0001, Fig 1).
Multivariable linear regression analysis for prediction of HCV viral load
A multiple linear regression analysis with backward approach was performed to develop a mathematical model by which it was possible to predict HCV viral load. All variables including HCV genotypes, serum ALT, and clinical manifestation, such as arthralgia, fatigue, fever, vomiting or dizziness, were entered into the model. Moreover, the genotype variable was set as one of three indicator variables and the mixed genotype was set as a reference category. To assess the best model, backward method with removal probability of 0.1 has been used. Finally based on R2 coefficient and after eliminating variables with multicollinearity, ALT, genotypes and fatigue were identified significant. Values for tolerance among these four variables ranged from 0.2-0.9 and variance inflation factors ranged from 1-5. The regression coefficients are shown in Table 3. Genotype 2, arthralgia, fever, vomiting, and dizziness, were excluded from the model (p>0.1). The final regression model was as Eq. (1): Log viral load= 7.69 -1.01 × G3 -0. 7 × G1 + 0.002 × ALT-0.86 × fatigue
As can be seen in Eq. (1), if the genotype of a patient is 3 or 1 (in comparison with mixed genotype), the mean predicted value of the log viral load would be reduced to 1.01 and 0.7 units, respectively. Similarly, in the case of patients who exhibit symptoms of fatigue, the model predicts that the log of viral load would be increased by 0.86 units. In this model, coding should be as follows: in patients with genotype 1, specify 1 as G1 and 0 as G3 and vice versa for genotype 3. If the patient has fatigue, use 1 as the fatigue input in the model; if he or she doesn’t suffer from fatigue, use 0. Evaluating of regression assumptions were summarized in Table 4.
For evaluating validity of the model we have conducted a 10-fold cross validation, in the sense that each time, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. Then, the average error across all k trials is computed (Table 5 and 6). Two of the popular measures of the model performance in cross-validation are the Root Mean Square Error (RMSE) and the mean absolute error (MAE). In our model the RMSE equals to 1.03 and MAE is 0.85. Figure 2 also illustrated log of viral load versus fitted values.