Identication of Cryptic Putative IRESs Initiating the Translation of Nonstructural Proteins Encoded by the HRV16 Genome

Cap-dependent initiation of translation is a canonical mechanism adopted by eukaryotic cells. Internal ribosome entry site (IRES)-dependent translation is a mechanism distinct from 5′ cap-dependent translation. IRES elements are located mainly in the 5′-untranslated regions (UTRs) of viral and eukaryotic mRNAs. In addition, IRESs are found in the coding regions of some viral and eukaryotic genomes and initiate the translation of some functional truncated isoforms. Here, via IRES-initiated expression of proteins, bicistronic vectors and ribosome proling of the human rhinovirus 16 (HRV16), we found that the coding region of the nonstructural proteins P2 and P3 contained 5 putative IRES elements. These 5 putative IRESs were located within nucleotides 4286-4585, 5002-5126, 6245-6394, 6619-6718 and 6629-6778 and initiated green uorescent protein (GFP) expression in vitro. This alternative mechanism might be effective and economical for eliminating the time and raw material required to synthesize the full-length polyprotein.


Introduction
The canonical eukaryotic translation mechanism is 5′ cap-dependent (m7GpppN) [1]. However, picornaviruses have adopted an alternative IRES-dependent translation mechanism to initiate polyprotein translation [2][3][4][5][6]. Studies show that the mRNA of multiple viruses and a minority (<10%) of the mRNA in eukaryotic cells contain IRES elements and that classical internal ribosome entry sites (IRESs) are located in the 5′-untranslated region (UTR). According to the secondary structure of their host RNA, IRESs can be divided into four types. Type IRESs are found in Enterovirus and Rhinovirus genomes [7,8]. Type IRESs are contained in Cardiovirus and Aphthovirus genomes [9]. HCV and HCV-like IRESs are found in some members of the Flaviviridae and Picornaviridae families, such as hepatitis C Virus [10], classical swine fever virus [11,12], porcine teschovirus-1 [13], and simian virus 2 [14]. Intergenic region(IGR) IRESs are originally found in the cricket paralysis virus genome and exist widely in members of the Discistroviridae family [15]. In contrast, IRESs in eukaryotic cells are di cult to classify into different types because of their different structures. In addition to being located in the 5′-UTRs of eukaryotic cellular genes [16][17][18], some IRESs are also found in the coding region of eukaryotic cellular genes, such as those encoding the 14-3-3 and prion proteins [19,16]. In addition, our previous study showed that multiple putative IRESs are located in the coding region of the Coxsackievirus B type 3 (CVB3) genome (unpublished). IRESs in the coding regions of viral genomes might initiate the translation of speci c proteins during a particular virus life cycle. The Picornaviridae family is one of the largest known virus families and includes many important human and animal pathogens. Picornaviruses are nonenveloped RNA viruses possessing a single-stranded RNA (positive-sense (+)) genome (7-8 kb) composed of a 5′-NTR, an open reading frame (ORF), a 3′-NTR and poly(A) tail. The ORF is translated into a large polyprotein, which is proteolytically cleaved by viral proteases(2A, 3C) to release 4 structural proteins (VP4, VP2, VP3, and VP1) and 7 nonstructural proteins (2A pro , 2B, 2C, 3A, 3B, 3C pro , 3D pol and, in some genera L) [20,21].
Human rhinovirus 16 (HRV16) belongs to the Rhinovirus genus in the family Picornaviridae. The HRV 16 genome is a single-stranded positive-sense RNA genome with a length of approximately 7.5 kb [21].
According to the viral genome structure and classical virus replication mechanism, nonstructural proteins associated with viral replication, such as 2A pro , 2B, 2C, 3A, 3B, 3C pro and 3D pol , are synthesized after the structural proteins. However, nonstructural proteins are the central players in the RNA replication and transcription machinery during the life cycle of RNA viruses. 3D pol , an RNA-dependent RNA polymerase (RdRp), is indispensable for both replication and transcription of the viral genome. 3D pol uses VPg (3B) as a primer to initiate the replication process. Both 2A pro and 3C pro are cleaved to form viral functional components; the molecular weight of the large protein is approximately 240 kDa. Synthesis of such a large protein is certain to affect the e ciency of viral replication. Therefore, we believe that there may be a more effective mechanism by which viruses can synthesize proteins from the genome. Considering this possibility in combination with results from previous studies, we believe that IRESs are contained within the coding region of the viral genome in addition to the 5' noncoding region. Therefore, HRV16, was used as a model, system to search for putative IRES elements in the viral genome.
To support the hypothesis, a complete experimental scheme was designed to search for putative IRESs, and then verify their function. We found 5 putative IRESs with length varying from 100 bp to 300 bp in the coding region of the nonstructural proteins P2 and P3 in the viral genome. Thus, HRV16 utilizes putative IRESs within the coding region to initiate the translation of nonstructural proteins.

Western blotting
Cells were collected with cell scrapers and were then centrifuged at 1000 × g for 3 min. Cell pellets were washed with phosphate buffer saline (PBS) and then lysed with cell lysis buffer (Beyotime) on ice for 40 min. Cell supernatants were harvested by centrifugation at 4°C and 15000 × g for 10 min. Protein samples were separated by SDS-PAGE and were then transferred to nitrocellulose (NC) membranes (GE). NC membranes were blocked with 5% skim milk in PBS at room temperature for 2 h and were then incubated with anti-GFP (Proteintech), anti-GRP 78 (Abcam) and anti-β-actin (Abcam) monoclonal antibodies overnight on a shaker at 4°C. Next, NC membranes were incubated with a horseradish peroxidase-conjugated anti-mouse IgG secondary antibody (Abcam) at room temperature for 1 h. Speci c protein bands on the NC membranes were detected with enhanced chemiluminescence (ECL) detection kit (PerkinElmer).

Luciferase assay
BHK-21 cells were cultured in 96-well plates to 70-80% con uence and were then transfected with bicistronic luciferase plasmids. Luciferase activity was measured with a Dual-Luciferase Reporter Gene Assay Kit (Beyotime) according to the manufacturer's instructions at 24 h post-transfection. The ratio of F-Luc activity to R-Luc activity (F-Luc/R-Luc) showed the translation initiation ability of the putative IRES.

RNA preparation and quantitative reverse transcription-PCR (qRT-PCR)
Bicistronic reporter plasmids containing the nal truncated putative positive IRES sequences were transfected into BHK-21 cells. Total RNA was extracted with TRIzol reagent (Sigma-Aldrich) 24 h post transfection and reverse transcribed to cDNA with a reverse transcription kit (Takara). Two pairs of speci c primers targeting the R-Luc and F-Luc genes were designed with Oligo software (supplementary materials 2). qRT-PCR was performed with SYBR Premix Ex Taq II (Takara) using the primers described above. ERS increased the expression of glucose-regulated protein 78 (GRP 78) has been reported in previous studies [22,23]. BHK-21 cells were treated with 0.25 μM TG (Sigma-Aldrich) for 12 h. The expression of GRP 78 was detected.
BHK-21 cells were cultured in 96-well plates to 70-80% con uence and were then transfected with the bicistronic luciferase plasmids. The BHK-21 cells were treated with or without 0.25 μM TG at 12 h posttransfection. Luciferase activity was detected at 24 h post-transfection.

Results
3.1. Identi cation of putative IRESs within coding regions in the HRV16 genome initiating the translation of nonstructural proteins According to previous studies [19,16], the target genome was inserted into pEGFP-N1. After transfection, GFP-fused proteins translated via de novo synthesis and dependent on an IRES-mediated mechanism were identi ed by Western blotting with an anti-GFP antibody. If the same putative IRESs appeared as two or more inserted fragments, bands of the same sizes would be detected in the adjacent lanes on the Western blot. To investigate the presences of putative IRES in the nonstructural protein coding region of the HRV16 genome, the sequence of the ORFs from each start codon (AUG) to the C-terminal codon of P2 or P3 were cloned into the vector pEGFP-N1 separately. If two start codons were very close to each other, only the longer sequence with two start codons was selected for cloning into pEGFP-N1. Thus, 5 sequences in the P2 region and 8 sequences in the P3 region were individually cloned into the vector pEGFP-N1; the resulting plasmids were designated pP2(2969-4861), pP2(3632-4861), pP2(3926-4861), pP2(4256-4861), pP2(4586-4861), pP3(4586-7084), pP3(4874-7084), pP3(5177-7084), pP3(5672-7084), pP3(5993-7084), pP3(6164-7084), pP3(6395-7084) and pP3(6596-7084). The 5 sequences in the P2 coding region were nt 2969-nt 4861, nt 3632-nt 4861, nt 3926-nt 4861, nt 4256-nt 4861, nt 4586-nt 4861 and 8 sequences in the P3 coding region were nt 4586-nt 7084, nt 4874nt 7084, nt 5177-nt 7084, nt 5672-nt 7084, nt 5993-nt 7084, nt 6164-nt 7084, nt 6395-nt 7084, and nt 6596-nt 7084. Because the full-length 5′ termini of P2 and P3 did not contain an AUG, the rst AUG in the P2 sequence was selected in the VP1 coding region, and the rst AUG in the P3 sequence was selected in the 2C coding region (Fig. 1a). The molecular weights of the GFP-fusion proteins corresponding to the  Fig. 1 b). Bands corresponding to the same molecular weights namely, approximately of 55 kDa, 40 kDa and 38 kDa detected in many adjacent lanes (Figure 1b). After removal of GFP-tag, the nucleotide sequences corresponding to the remaining part of the three proteins were nt 4256-nt 4861, nt 4481-nt 4861 and nt 4586-nt 4861, individually. 300-bp sequences upstream of the above nucleotide sequences were nt 3956-nt 4255, nt 4181-nt 4480 and nt 4286-nt 4585, individually. We concluded that the putative IRESs in the P2 region were located within the nt 3956-nt 4255, nt 4181-nt 4480 and nt 4286-nt 4585 sequences. Multiple protein bands were also detected after transfection of pP3(4586-7084), pP3(4874-7084), pP3(5177-7084), pP3(5672-7084), pP3(5993-7084), pP3(6164-7084), pP3(6395-7084) and pP3(6596-7084) (Fig. 1c). Speci cally, 7, 7, 8, 7, 6, 7, 5 and 4 bands were found in the lanes of samples transfected with pP3(4586-7084), pP3(4874-7084), pP3(5177-7084), pP3(5672-7084), pP3(5993-7084), pP3(6164-7084), pP3(6395-7084) and pP3(6596-7084), respectively. Bands corresponding to the same 6 molecular weights namely, approximately 100 kDa, 75 kDa, 70 kDa, 58 kDa, 38 kDa and 32 kDa appeared in many adjacent lanes (Fig. 1c). After removal of GFP-tag, the nucleotide sequences corresponding to the remaining part of the six proteins were nt 5177-nt 7084, nt 5672-nt 7084, nt 5993-nt 7084, nt 6395-nt 7084, nt 6719-nt 7084 and nt 6929-nt 7084, individually. 300-bp sequences upstream of the above nucleotide sequences were nt 4877-nt 5176, nt 5372-nt 5671, nt 5693-nt 5992, nt 6095-nt 6394, nt 6419-nt 6718 and nt 6629-nt 6928, individually. We concluded that the putative IRESs in the P3 region were located within the nt 4877-nt 5176, nt 5372-nt 5671, nt 5693-nt 5992, nt 6095-nt 6394, nt 6419-nt 6718 and nt 6629-nt 6928 sequences. In summary, we concluded that 9 putative IRESs might be located in the nonstructural protein coding region of the HRV16 genome. 3.2. Con rmation of putative IRES elements within the nonstructural proteins coding region of the HRV16 genome through bicistronic vectors The above results indicate that 9 putative IRESs are located upstream of 9 potential IRES-dependent proteins sequences. To search for putative IRESs sequences, according to a previous study [16,25,26], a dualluciferase reporter plasmid (p-IRES.CHECK) with a hairpin structure (ΔGcal = -74.4 kcal mol-1) between the R-Luc and F-Luc sequences was constructed ( Fig. 2a and 2b (Fig. 2a and2b). We established a criterion of an F-Luc/R-Luc ratio of more than 3-fold greater than that of the negative control after transfection of a plasmid containing a putative IRES sequence ( Fig.  2a and 2b, right panels). The above plasmids were transfected into BHK-21 cells for 24 h and the F-Luc/R-Luc ratio of each reporter vector was calculated relative to that of the negative control.  (Fig. 4e, right  panel). The F-Luc/R-Luc ratio in cells transfected with of pP3-IRES-(6629-6728) was less than 0.7-fold that in cells transfected with pP3-IRES-(6629-6778). Therefore, we concluded that the putative IRES is located within a 150-nucleotide region spanning nucleotides 6629 to 6778. In summary, we found that 5 putative IRESs were located at nt 4286-nt 4585 in the 2C region, and at nt 5002-5126, nt 6245-nt 6394, nt 6619-nt 6718 and nt 6629-nt 6778 in the P3 region. The positions of the 5 putative IRESs in the HRV16 genome are shown (Fig. 5). 3.5. Veri cation of the function of the putative IRESs To verify the function of these putative IRESs in initiating protein expression in vitro, p-IRES.CHECK-GFP was constructed by inserting the hairpin structure downstream of R-Luc and replacing the F-Luc gene with the GFP gene based on p-IRES.CHECK (Fig. 6a). The abovementioned putative IRESs were inserted between the hairpin and GFP genes to generate vectors pP2-IRES(4286-4585)-GFP, pP3-IRES(5002-5126)-GFP, pP3-IRES(6245-6394)-GFP, pP3-IRES(6619-6718)-GFP and pP3-IRES(6629-6778)-GFP. The EMCV IRES sequence was inserted to generate pEMCV-GFP as the positive control vector. The scrambled putative IRES sequence shu ed randomly was inserted between the hairpin structure and the GFP gene to generate the negative control plasmid. After transfection into 293T cells for 24 h, GFP proteins in each group were detected through Western blotting with an anti-GFP antibody. Except for the nt 6245-nt 6394 and scrambled sequences, all sequences, including nt 4286-nt 4585, nt 5002-nt 5126, nt 6619-nt 6718 and nt 6629-nt 6778, effectively initiated GFP expression (Fig. 6). 3.6. Ribosome pro ling of the HRV16 genome Cycloheximide stalls ribosomes on mRNA by blocking translation. To understand the ribosome pro le of the HRV16 genome and further con rm the putative IRES sequences, H1-HeLa cells were treated with cycloheximide 21 and 24 h.p.i. The ribosome-protected segments were subjected to next generation sequencing. RiboSeq reads mapped to the HRV16 genome were counted (Supplemental materials 3). The ribosome pro ling results showed that successive nucleotide sequences with more than 150 reads covered the positions of the 5 putative IRESs described above ( Table 1). The locations were basically consistent with the genomic locations (Fig. 7). This consistency supported additional evidence for the biological function of the putative IRESs veri ed above by a bioinformatic approach.

Discussion
Translation initiation is a key step in protein synthesis in living cells [27,28]. The 5′-UTR of most eukaryotic mRNAs contains a cap structure that interacts with ribosomes to initiate translation [28]. The cap-dependent translation initiation mechanism is the canonical mechanism and is used by eukaryotes and most viruses. IRES elements were initially found in the 5′-UTR of the EMCV genome [29]. Viruses utilize an IRES-dependent translation mechanism to synthesize viral protein [30] and shut down the 5′ cap-dependent translation initiation mechanism of host cells in the endoplasmic reticulum. Approximately 10% of eukaryotes mRNAs contain IRES elements, which are related to cell growth, maturation, apoptosis, stress, and cycle regulation [31,32,30]. IRESs are RNA regions with certain structures that can recruit eukaryotic ribosomes and then initiate translation under certain conditions in which cap-dependent mechanisms are suppressed, such as DNA damage [33,34] and heat shock [35]. In general, most known IRESs are located in the 5′-UTRs of mRNAs. However, a few IRESs are located in the coding regions of viral or eukaryotic genomes. In murine hepatitis virus (MHV), an IRES is located in mRNA 5; its 280 nucleotides span ORF 5a and ORF 5b, the 3′ border comprises the initiation codon of ORF 5b [36].
GFP has generally been used as a tag in protein expression applications. Green uorescence can be observed though uorescence microscopy to indicate the expression of the fusion protein. In addition, a low molecule weight protein fused with GFP has an increased molecular weight and can be detected with an anti-GFP antibody, obviating the need for a monoclonal antibody against a speci c protein. As previous study showed, if one gene contains several putative IRESs, the fusion protein whose translation is initiated by the putative IRES can be detected in the same lane by Western blotting with an anti-GFP antibody. In this research, the Western blots showed that 9 bands corresponding to truncated proteins with the same molecular weight and located in the same position in multiple lanes, indicating that these proteins might be expressed through an IRES-dependent mechanism.
The indicated molecular weight of the protein expressed by pP2(2969-4861) was approximate to the predicted molecular weight (S Fig. 1). The insigni cant differences in size might be di cult to distinguish by SDS-PAGE separation. The molecular weight of 3ABCD-GFP (111.8 kDa) was less than that of the predicted full-length fusion protein (121.8 kDa) (Fig. 1c). We concluded that this discrepancy was due to cleavage by the viral protease 3C pro , because the protein expressed by pP3(4586-7084) contained a 3C pro cleavage site at the junction of proteins 2C and 3A. Upon treatment of 5 μM or 10 μM rupintrivir, the predicted full-length fusion protein expressed by pP3(4586-7084) was detected. In addition, the inhibitory on 3C pro activity was related to concentrations of rupintrivir (S Fig. 2). The region encoding the 3ABCD protein spanned nucleotides 4862-7084 in the HRV16 genome, and the predicted full-length fusion protein expressed from pP3(4874-7084) did not contain the 3C pro cleavage site at the junction of proteins 2C and 3A. Additionally, the 25-kDa bands observed in every lane might be correspond to free GFP cleaved from fusion protein (Fig. 1b and 1c) [37].
In previous studies, the nucleotide sequence lengths of IRESs in coding regions were generally found to be less than 300 bp [36,16]. To identify the authenticity of putative candidate IRES elements, the putative IRES sequences with a length of 300 bp were cloned into a bicistronic expression vector containing the R-Luc and F-Luc genes. In addition, the classical EMCV IRES was inserted into a bicistronic expression vector to generate a positive control vector, similar to the method used in other research [38]. A hairpin structure (ΔG cal = -74.4 kcal mol -1 ) was inserted between R-Luc and F-Luc genes to guarantee that F-Luc gene was translated without interference of R-Luc gene. As a barrier, the stable hairpin structure prevented ribosomes from reading through the stop codon of the R-Luc ORF but did not affect the expression of downstream genes [39,40,16]. The R-Luc gene in the bicistronic expression vector was translated in a capdependent manner. In contrast, F-Luc gene expression was dependent on the inserted nucleotide sequence. Putative IRES activity was represented by the F-Luc/R-Luc ratio relative to that in the negative control cells. According to our previous research (unpublished), the criterion for a true putative IRES was an F-Luc/R-Luc ratio at least 3-fold greater than that in the negative control cells. The results showed that 5 putative IRES elements were located in the nonstructural protein coding region of the HRV16 genome. Deletion analysis was conducted to map the ranges of the 5′ and 3′ boundaries of these putative IRESs. If the putative IRES activity of the truncated isoform was greater than 0.7-fold that of the intact nucleotide sequence, the putative IRES element was deemed to be located in the truncated region.
According to this criterion, the putative IRES activity of nucleotide sequence 4286-4585 was dependent on nt 4286-nt 4435 and nt 4436-nt 4585 (Fig. 3a). In mapping the putative IRES within the nt 6095-nt 6394 region, the putative IRES activity of nucleotide sequences 6245-6394 was found to be greater than 0.7fold that of the full-length nucleotide sequence 6095-6394 (Fig. 3c). Further deletion mutation of nucleotide sequence 6245-6394 led to a signi cant reduction in putative IRES activity (Fig. 4c). Similar experimental results were found in mapping putative IRESs within the nt 6629-nt 6928 region ( Fig. 3e and  Fig. 4e). The putative IRES activity of the truncated region was greater than 0.7-fold that of the full-length sequences, showing that the truncated region was essential for putative IRES activity [32].
After truncation, the putative IRES activity levels of nucleotide sequences 4952-5101 and 5027-5176 were much higher than that of nucleotide sequence 4877-5176 (Fig. 3b). The putative IRES activity increased appropriately after further truncation (Fig. 4a). Similarly, in mapping the putative IRES within nt 6419-nt 6718, putative IRES activity was found to be slightly increased when the nucleotide sequences were truncated stepwise ( Fig. 3d and 4d). This phenomenon might have arisen from alleviation of steric hindrance on ribosome binding to the IRES or IRES trans-acting factors (ITAFs) after truncation. The putative IRES activity was almost equal in nucleotide sequences 5002-5101 and 5027-5126. In addition, the main segment of these two regions overlapped. Therefore, the putative IRES was located roughly within the 125-nucleotide region spanning nucleotides 5002-5126. Ribosome pro ling showed that the ribosome positions in the genome "snapshot" were in the abovementioned putative IRES regions [41].
According to our previous study [16], to exclude the possibility of alternative splicing in the constructed vectors, copies of two cistrons were detected by RT-qPCR after transfection. The mRNA copy numbers of F-Luc and R-Luc were not signi cantly different, indicating that no major aberrant splicing isoforms were contained in the constructed vectors (S Fig. 3 and S Table 1).
The TG used as an inducer of ERS has been reported in previous studies [23,42] Our results indicated that the putative IRES can effectively initiate the expression of GFP in vitro (Fig. 6). The stable secondary structure was essential for putative IRES activity [45]. Obvious stem loops were found in the secondary structures of putative IRESs. The possibility of the shu ed sequence forming a stable secondary structure was much less than that of the putative IRES (supplementary materials 4).
The potential secondary structure in the shu ed sequence might lead to insigni cant cap-independent translation.
In general, IRES elements are located within the 5′-UTRs of cellular or viral mRNAs upstream of the AUG initiation codon and initiate the translation of full-length proteins [46]. However, several reports have shown that IRESs, within the coding regions of viral genomes, drive protein translation. The RNA genome of HIV-1 contains two IRESs. The rst one is located in the 5′-UTR and mediates viral replication during the G 2 /M phase of the cell cycle [26]. The other one is located within the coding region of gag and initiates the expression of a novel Gag isoform though an AUG initiation codon in the gag coding region. This Gag isoform may participate in HIV-1 replication in vitro [47]. In addition, the HIV-2 virion contains three isoforms of Gag (57 kDa, 50 kDa and 44 kDa) and two N-terminally truncated shorter isoforms are translated by an IRES element located in the coding region of Gag. As integrated Gag polyproteins participate in viral capsid assembly and genome packaging, the two shorter Gag isoforms might be synthesized independently and perform other functions of Gag [40]. In the canonical mechanism, a single integrated polyprotein is synthesized though the IRES in the 5′-UTR of viral genomic RNA and is then cleaved by viral proteases 2A and 3C to generate structural and nonstructural proteins [21]. The polyprotein synthesis needed to meet a possible requirement for certain viral proteins during the infection cycle of HRV16 might cost time and waste energy.
Considering our results, we concluded that the translation of proteins with various molecular weights is mediated through different putative IRESs in the HRV16 genome. The translational e ciency of viral nonstructural proteins is inversely related to the order of the corresponding protein coding region in the genome. A segment of 3D (52.3 kDa) dependent on the putative IRES within nt 6619-nt 6718 or nt 6629-nt 6778 was translated in the highest amount, and whether this protein is conducive to viral replication needs further study. In addition, the translation of a protein containing a segment of 3B and the complete 3CD protein was initiated by the putative IRES located between nucleotides 5002 and 5126, and the translation of a protein containing a segment of the 2C protein was initiated by a putative IRES located between nucleotides 4286 and 4585. These nonstructural proteins e ciently promote virus replication.  Tables   Table 1 The positions of successive nucleotide sequences with more than 150 reads basically covered those of the 5 validated IRESs identi ed above  The F-Luc/R-Luc ratio relative to that of the negative control (F-Luc/R-Luc/N) was calculated. Experiments were repeated independently three times. *P, positive. N, negative