Quantitation of viral gene expression using RNA-sequencing
Although we observed that the Shamonda virus averages 10 reads per million human mapped reads (RPHM) in poly(A) selected RNA-seq, this is likely an artifact and not true infection because viral reads were detected in every sample analyzed, including the normal controls. In addition, manual BLAST showed that the actual reads hit to human sequences that were mistakenly being called Shamonda virus, and a few repeat reads have been clonally amplified and resulting in such a high read number. The next most commonly detected virus was human adenovirus C with 1 read per million human mapped reads (table 2). The mapped reads of other viruses were very low (under 1 RPHM, table 2) in poly(A) selected RNA-seq. This could be due to exclusion of viral RNA that is not polyadenylated. To detect viral encoded non-coding RNAs, we performed a non-poly(A) RNA-seq using ribodepleted RNA libraries for our initial group 1 samples (5 controls and 12 IPF lungs). Non-poly(A) selected RNA-seq detected more virus than poly(A) selected RNA-seq, including tick-born encephalitis virus, herpesvirus 2 (HHV-2, HSV-2), Roseolovirus (HHV-6B) and EBV (HHV-4, table 3, table S3). However, there were no significant differences between control and IPF (table 3, table S3, S4). These data were confirmed by analysis of viral RNA expression using another non-poly(A) selected RNA-seq datasets (the third group dataset, table S5). Overall, none of the samples from either the control or IPF groups reached a virus detection threshold high enough to qualify as positive. We conclude that there are no viruses associated with IPF tissue samples (table 2, table S1).
Screening for EBV, HCV, HHV-7 and herpesvirus saimiri RNA using real-time RT-qPCR
To confirm our RNA-seq results, we performed serial RT-qPCR on the first group of specimens (12 IPF and 5 control lung RNAs). This was not performed on the second and third group because we only had the data sets and not the RNA. EBV has two major infection gene expression programs, the latency associated gene expression program and the lytic gene expression program, which are uniquely utilized depending on cell type. Since it is not known which cell type might harbor EBV within IPF lung, and to avoid “lack of detection” errors due to EBV infection status, primers spanning the EBV latent genes, EBNA1, Qp and LMP1, as well as the EBV lytic gene Zta were employed for RT-qPCR. No EBV latent or lytic gene expression was detected using RT-qPCR, suggesting that neither the latent nor the lytic forms of EBV were present in the lungs of IPF patients or the control group (data not shown). However, using primers that span the EBV-encoded noncoding small RNAs, EBER1 to EBER2, we detected a very low level of EBERs expression in both the IPF and control specimens, with cycle threshold [34] values over 33 cycles and with no significant difference between the two groups (figure 1A). This data is consistent with the analysis of the non-polyA selected RNA-seq.
Other ubiquitous herpes viruses have also been reported to be associated with IPF, including herpes simplex virus type 1 (HSV-1), HHV-6, -7 and -8 and cytomegalovirus (CMV) [2]. Our RNA-seq data detected sporadic and very low virus mapped reads per million human mapped reads (RPHM – reads per million human mapped) for these viruses; HHV-5 with 1 RPHM in IPF lung and 2 RPHM in control lung; HHV-6 with 1 RPHM read in control and HHV-7 with 2 RPHM in IPF (table 2). RT-qPCR Ct values for these viruses were around 40, and therefore not reliable for quantification of these HHVs (data not shown). Chronic infection of HCV has been implicated in liver fibrosis; however, it is still debatable whether HCV can cause pulmonary fibrosis. While some research indicates that HCV infection may play an important role in the pathogenesis of IPF [4, 5], others have not detected HCV RNA in IPF samples, despite detection in some specimens using ELISA [10, 35]. No HCV mapped reads were detected in any of our IPF or control lung specimens using RNA-seq (table S1). A nested real-time RT-qPCR assay with primers spanning the 5-UTR of HCV [28, 36] detected very low levels of HCV transcripts with Ct value over 30 cycles (figure 1B). Importantly, the ∆∆Ct for HCV was not significantly different between IPF and controls (figure 1B).
More recently, Folcik et al. reported that IPF is associated with herpesvirus saimiri but not with other herpesviruses such as EBV, KSHV, CMV or HSV I/II [30]. They detected herpesvirus saimiri DNA and RNA in all 13 IPF cases and none of their controls. Herpesvirus saimiri is a member of the rhadinovirus genus, which also includes Kaposi’s sarcoma-associated herpesvirus, and can infect humans and squirrel monkeys without causing disease. Around 4.0-7.3% of humans are seropositive and express viral proteins such as viral cyclin D [37]. Although no substantial herpesvirus saimiri virus reads were detected in any of the IPF and control specimens using RNA-seq (table 2 & S1), we still performed RT-qPCR to assess expression of herpesvirus saimiri using primers against viral cyclin D1 and viral ORF73 (a conserved viral gene). We did not detect significant expression of ORF73 in IPF patient samples compared to controls (data not shown). We observed high expression of human cyclin D1 (figure 2B) and very low expression of viral cyclin D1 (Ct value over 30, figure 2A) in both IPF and control samples, indicating lack of an association between herpesvirus saimiri and IPF.
HERV-K gene expression and coverage in IPF patients.
HERV sequences make up about 4.9% of the human genome. HERV-K research has been assessed in autoimmune disorders and oncogenesis, yet to date we are not aware of any literature to assess its possible role in pulmonary fibrosis. Recently, RT-PCR results have suggested that HERV-K env mRNA was increased in PBMC and skin biopsies of morphea/localized scleroderma [38]. This study suggests that HERV-K env may be functionally linked to fibrosis. HERV-K gene expression could theoretically promote IPF through cell stress, and HERV-K expression is reported to be higher with EBV infection [39]. Therefore, we evaluated whether HERV-K genes are upregulated in IPF lung. Notably, of the viruses analyzed in poly(A) selected RNA-seq, HERV-K was the virus with the highest read numbers (23 to 83 HERV-K mapped reads per million human mapped reads in both IPF and control samples) (figure 3A & table S2). Statistical analysis showed about a 2-fold increase in the 11 IPF patient samples compared 5 controls in group 1 (figure 3A & 3B). However, no statistical difference was evident between IPF and controls in the second group (figure 3A & 3B). These data were confirmed by non-poly(A) selected RNA-seq in the initial group and the third group. Non-poly(A) selected RNA-seq detected more HERV mapped reads than poly(A) selected RNA-seq (table 3, table S4, table S5). Overall we were not able to make an association between HERV-K gene expression and IPF.
Quantitative RT-PCR of the HERV-K env and long terminal repeat (LTR) regions show that the expression levels of env and LTR were higher in IPF than in controls (two-fold difference, figure 3D), which corroborates the RNA-seq data. Next, strand-specific nested RT-PCR was performed with primers spanning the HERV-K env and LTR regions in group 1. The primers were originally designed to detect viral 1x env splicing transcripts [31]. Since HERV-K can be transcribed from the LTR at either or both directions, the sense strand and anti-sense strand, we performed strand-specific RT-PCR to detect the plus strand and the minus strand using forward (LTR-Fwd) or reverse primers (LTR-Rev) for reverse strand transcription of the LTR. As shown in figure 3C, we found no statistical difference in expression of env and LTR from either direction between IPF and controls. We observed that there were several different sizes of env spliced transcripts. Eleven of 12 (91.7%) IPF samples were env positive, compared to 3 of 5 (60%) controls, and the majority of env transcripts were large in IPF (9 of 11), compared with 1 of 3 env in controls (figure 3C). In summary, the spliced env appears preferentially expressed in IPF, and we do not yet know whether the large env may play a role in IPF pathogenesis.