Phylogenetic and Codon Usage Analysis for Capsid and Replicase Genes of Porcine Circovirus 3


 Porcine circovirus type 3 (PCV3) is a highly contagious virus belonging to the family Circoviridae that causes the severe dermatitis and nephropathy syndrome. To date, PCV3 has a worldwide distribution and bring huge economic loss in swine industry. Replicase protein (Rep) and capsid protein (Cap) are two major proteins of PCV3. Considering that the large number of new PCV3 isolates were reported in the past few years and the research for the codon usage pattern of Rep and Cap genes was still a gap, phylogenetic and codon usage analysis of these two genes was performed. Phylogenetic analysis with all strains showed no clear clusters were displayed, but almost all strains of one genotype were separated into same clade. Relative synonymous codon usage (RSCU) analysis revealed that the codon usage bias existed and effective number of codon (ENC) analysis showed that the bias was slight low. ENC-GC3s plot indicated that mutational pressure and other factors both play a role in PCV3 codon usage and neutrality plot analysis showed that natural selection was the main force influencing the codon usage pattern. In summary, the results provided the important basic data on codon usage pattern of Rep and Cap genes, and a better understanding of the evolution and potential origin of PCV3.


Introduction
PCV3 is an emerging circovirus which spread among a wide range of species and several syndromes were caused by this virus in pigs (Kedkovid et al. 2018). To date, PCV3 has spread worldwide with highly positive rate (Bera et al. 2020;Kwon et al. 2017;Yuzhakov et al. 2018). PCV3 is a non-enveloped, singlestranded circular DNA virus, which is the member of genus Circovirus of the family Circoviridae (Palinski et al. 2017;Phan et al. 2016). The length of genome of PCV3 is 2000 nucleotides containing three open reading frames (ORFs) (Saraiva et al. 2018). ORF1 and ORF2 are two major open reading frames that encoded replicase protein (Rep) and capsid protein (Cap) in opposite directions, respectively. Among them, replicase protein comprises 297 amino acid, associated with virus replication and capsid protein is the major structural protein with 214 amino acid, containing immunologically epitopes associated with the viral entry and neutralization (Klaumann et al. 2018;Mankertz et al. 2004).
The research of codon usage bias is an important eld, which can be of bene t to molecular biology or genetics, such as determining the origin and evolution of species and predicting the gene expression level (Butt et al. 2014;Yu et al. 2021a). Although several previous studies reported the codon usage bias of other Circovirus, such as PCV1, PCV2 (Chen et al. 2014) and duck circovirus (Xu et al. 2015), there are few reports on PCV3. Here, a comprehensive codon usage analysis using replicase and capsid genes obtained from 590 PCV3 complete sequences was performed to explore the mechanism of codon distribution and evolution of the virus.

Sequence data
Page 3/16 The complete DNA sequences of all 590 PCV3 isolates were from the GenBank database (http://www.ncbi.nlm.nih.gov) of 10 February 2021. Some strains that without the collection date, country or host were excluded. Four genotypes of PCV3, including PCV3a-1, PCV3a-2, PCV3a-IM and PCV3b, were distinguished by unique epitope sequences and amino acid sites 122, 320, 323, 373 and 446 (Li et al. 2018a). The detailed information of the selected PCV3 strains was provided in Supplementary Table S1. The individual ORF1 and ORF2 genes were cut from every complete PCV3 sequences for next analysis.

Recombination and phylogenetic analysis
All 590 complete sequences of PCV3 were used to carry out the recombination analysis. Potential recombination events were detected using RDP4 software by the BOOTSCAN, 3SEQ, CHIMAERA, MAXCHI, GENECONV, SISCAN, and RDP programs (Martin et al. 2015). The type of sequences was set to circular and the p value was set to 0.05. Genomes were considered to be recombinants if a recombination event was supported by at least three of the above all algorithms (P-value < 0.01) or the recombination score was above 0.6 (Liu et al. 2019;Wang et al. 2017). Then, all complete coding regions (ORF1 + ORF2) were aligned by Clustal W and phylogenetic analysis was carried out with MEGA7 software. The phylogenetic tree was constructed by the neighbor joining (NJ) method with 1000 bootstrap replicates.

Nucleotide composition analysis
The frequencies of A, U, C, G and G + C content were analyzed using Bio-edit. The frequency of nucleotide compositions at the third position (A3s, C3s, U3s and G3s) of synonymous codon were calculated by CodonW 1.4.4. GC content at third codon positions (GC3s) of synonymous codon was also computed by CodonW. All detailed information was shown in Supplementary Table S2 and S3. Relative synonymous codon usage (RSCU) RSCU values represent a ratio between the observed frequency of one synonymous codon and expected frequency (codons for one amino acid used equally), which is the important measure to standardize the codon usage bias. The equation of RSCU index is described as follows: Where Xij is the number of the ith codon for the jth amino acid, and ni represents the degenerate numbers of a synonymous codon. The RSCU value of 1 represents no codon bias and equal usage for the amino acid. The RSCU value > 1.0 shows positive codon usage bias while the RSCU value < 1.0 shows negative codon usage bias. Especially if RSCU is higher than 1.6 or less than 0.6, it is considered to be overrepresented or under-represented (Yu et al. 2021b). RSCU values of each codon was computed by CodonW and the RSCU values of the host (Swine) were downloaded from the codon usage database (http://www.kazusa.or.jp/codon/). Effective number of codon (ENC) and ENC-Plot analysis The degree of codon usage bias can be quanti ed by the ENC analysis (Liu et al. 2010). ENC values is used to quantify and re ect the extent of preference of synonymous codons. ENC values is an absolute measure which range from 20 to 61 (He et al. 2019 ENC-plot analysis was used to detect the in uencing factors of codon usage variation. ENC-GC3s plot was generated using GraphPad Prism 8. Expected ENC values for each GC3s was calculated using the following equation: Where s is the number of the GC3s value. If the codon usage is only constrained by the mutation pressure, points will be on or around the expected curve. Otherwise observed ENC values will lie far lower than the expected curve if several factors constrain the codon usage (Bera et al. 2017).

Principal component analysis (PCA)
PCA is a multivariate statistical method, which is performed by analyzing the relationship between variables and samples to identify major variation trends (Nasrullah et al. 2015). In this analysis, the PCA was performed with SPSS software (Version 22) to explore the trends of codon usage pattern among the different PCV3 strains of Rep and Cap gene. In detail, the RSCU values of each PCV3 strains of two genes were distributed into a 59-dimensional vector corresponding to the 59 synonymous codons (excluding the codons of AUG, UGG and the three stop codons), and then they were transformed into uncorrelated variables (principal components). Among them, the rst two axes are accounting for most of the component in uencing the codon usage variation among genes. PCA plots were constructed with the rst two axes and the gures were drawn by Graph Pad Prism 8.0.
General average hydropathicity (Gravy) and aromaticity (Aroma) Gravy and Aroma are two major indices on translation and natural selection and the values respectively show the frequencies of hydrophobic and aromatic amino acids (Xu et al. 2015). In this study, CodonW 1.4.4 program was used to calculate the two values.

Neutrality Plot Analysis
The neutrality plot analysis was performed to explore the effects of natural selection and mutation pressure on codon usage by plotting the GC12s against GC3s (Yu et al. 2021b). Each point represented an independent PCV3 strain of Rep or Cap gene and the regression line was plotted. If the regression curve lay near the diagonal (slope = 1), it was considered that mutation pressure was the dominant force for the codon usage bias with weak external selection pressure. Alternatively, natural selection was the main role in shaping codon usage if the slope of the regression curves tended to 0. The gure of neutrality plot was drawn using Graph Pad Prism 8.0.

Result
Recombination and phylogenetic analysis Four recombination events, containing MH229786, MF318448, MT183690 and MF405276, were detected by the RDP software. Take out those recombinant isolates and surplus coding sequences were carried on to the next analysis. For phylogenetic analysis, the NJ tree was constructed with complete codon regions. As shown in Fig.S1, no clear clusters were displayed, but almost all strains of one genotype were separated into same clade.

RSCU analysis and ENC analysis
To determine the codon usage bias of two PCV3 genes and effect of hosts (swine), RSCU values of 59 synonymous codons were calculated and compared with the RSCU values of Swine (Table 1). The results indicated that, ten preferred codons were demonstrated to be common between swine and PCV3 Rep or Cap Gene. Among them, three were commonly used between the host and both two genes.

ENC-Plot analysis and correlation analysis
ENC-GC3s plot was generated to investigate the mutational pressure in shaping codon usage bias. As shown in Fig. 1, all points were lower than the standard curve no matter PCV3 Rep or Cap gene. To determine the effect of mutational pressure on the codon usage, the analysis between ENC value and each nucleotide content were calculated and indicating a remarkable correlation with the P values much below 0.01 (Table 2). Furthermore, the correlation analysis between nucleotide composition (A, T, G, C, and GC) and codon compositions (A3s, T3s, C3s, G3s, and GC3s) were also performed (   The numbers represent correlation coe cient "r" values ( * P < 0.05, ** P < 0.01).

Principal component analysis (PCA)
The principal component analysis was carried out on the RSCU values to detect the variations of codon usage and the PCA plot of the rst and second axes was drawn against each other according to the different countries. As shown in Fig. 2A-D, all points divided by years and continents were dispersed and no clear trend could be observed. But, points of different genotypes were clustered into two groups (3a, 3b) for Rep gene (Fig. 1E) and three groups (3a-1and 3a-IM, 3a-2, 3b) for cap gene (Fig. 1F), which were also consistent with the genotypic classi cation based on predicted epitopes and amino acid analysis. Subsequently, correlation analyses between the rst two axes and nucleotide compositions were also performed and the results showed that most correlations were signi cant (Table 3). The numbers represent correlation coe cient "r" values ( * P < 0.05, ** P < 0.01).

The role of natural selection in the codon usage bias of PCV3 Rep and Cap Gene
To determine the forces of natural selection, the correlation analyses were performed for evaluating the relationship between Gravy, Aroma and A, C, G, U, GC, GC3s. As shown in Table 4, the majority of correlations were signi cant whereas several correlations with Aroma of Cap gene were weak.  The numbers represent correlation coe cient "r" values ( * P < 0.05, ** P < 0.01).

Neutrality plot analysis
The neutrality plot was constructed to determine the main factor shaping the codon usage pattern of the PCV3 Rep and Cap Gene (Fig. 3). The results showed a slight negative correlation between the value of GC12s and GC3s (r = -0.39, P < 0.0001 for Rep gene; r = -0.283, P < 0.0001 for Cap gene). Slope of the regression line was 0.103 for Cap gene and only 0.014 for Rep gene, suggesting that natural selection was the main force while mutation pressure was a minor force for the codon usage pattern of PCV3 Rep and Cap Gene.

Discussion
Porcine circovirus type 3 (PCV3), rst recognized in USA in recent years, is an important pathogen for swine with severe dermatitis and nephropathy syndrome (Palinski et al. 2017). Several phylogenetic analyses were performed previously and showing the genetic diversity and evolution of PCV3 (Chen et al. 2019;Saraiva et al. 2018;Tan et al. 2020) and the analysis of this study with 590 strains showed a more objective and comprehensive result. Although the phylogenetic data of PCV3 could help us to deep understanding this virus, a deep and systematic analysis for the codon usage pattern is still needed. A previous study for host adaptability partly re ected the codon usage bias of PCV3 complete genomes with 52 strains collected up to 2017 (Li et al. 2018b). Considering that the many new PCV3 isolates were reported in the past four years and the research for the codon pattern of Rep and Cap gene was still a gap, a comprehensive analysis of codon usage bias for Rep and Cap gene was performed to explore the mechanism of codon distribution and procedure of evolution of PCV3.
In this study, several analyses were used to standardize the codon usage bias. Nucleotide composition demonstrated the codon usage bias existed in PCV3 genomes. RSCU analysis showed that more than half of all codons were over-represented or under-represented, which further revealed that the codon usage bias appeared and the virus had a relatively stable nucleotide composition of synonymous codons. Moreover, about half of preferred codons were common between host and PCV3 Rep or Cap genes, highlighting a coincident and antagonistic codon usage patterns compared with host, which was also consistent with the results of the previous study (Li et al. 2018b). For ENC analysis, all ENC values were higher than 35, revealing a slight low codon usage bias for two genes. Signi cantly, the ENC mean value of Rep gene was higher than the value of Cap gene, demonstrating that codon usage bias for Rep gene was more stable, which also re ected the phenomenon that mutations were mainly existed in Cap gene. PCA is a method which can identify major variation trends. As shown in Fig. 2, points of different years or continents were dispersed on the plot but points divided by genotype were clustered relatively, re ecting the high genetic stability of PCV3 genomes.
Mutation pressure and natural selection are thought to be the two most important factors for codon usage bias. ENC-GC3s plot showed that all of the points were below the standard curve, indicating that mutational pressure and other factors both play a role in PCV3 codon usage. Particularly, points divided by continent were dispersed on the plot but points divided by hosts were relatively concentrated except swine, which implied that PCV3 originated from swine. The neutrality plot analysis is a widely used method to explore the effects of natural selection and mutation pressure. The results revealed that the relative constraint (natural selection) was 89.7% for Cap gene and 98.6% for Rep gene, suggesting that natural selection was the main force for the codon usage pattern of two PCV3 Genes and the contribution of mutation pressure for Cap gene was greater than that for Rep gene.
In conclusion, a low level of codon usage bias was detected in PCV3 Rep and Cap gene, and natural selection was the primary force impacting the codon usage. Overall, In the face of the growing epidemic situation of PCV3, this study provided the further basic data for evolution of PCV3.

Data Availability
All data included in this study are available on request to the corresponding author.