Phylogenetic and codon usage analysis for replicase and capsid genes of porcine circovirus 3

Porcine circovirus type 3 (PCV3) is a highly contagious virus belonging to the family Circoviridae that causes the severe dermatitis and nephropathy syndrome. To date, PCV3 has a worldwide distribution and bring huge economic losses to swine industry. Replicase (Rep) and capsid (Cap) are two major coded proteins of PCV3. Considering the large number of new PCV3 isolates were reported in the past few years and the research for the codon usage pattern of Rep and Cap genes was still a gap, phylogenetic and codon usage analysis of these two genes was performed. Phylogenetic analyses showed that Rep genes in PCV3a were dispersed with no clear clusters while corresponding sequences in PCV3b clustered into two groups and Cap genes clustered into distinct clades according to different genotypes. Relative synonymous codon usage (RSCU) analysis revealed that the codon usage bias existed and effective number of codon (ENC) analysis showed that the bias was slight low. ENC-GC3s plot indicated that mutational pressure and other factors both played a role in PCV3 codon usage and neutrality plot analysis showed that natural selection was the main force influencing the codon usage pattern. The results presented here provided the important basic data on codon usage pattern of Rep and Cap genes, and a better understanding of the evolution and potential origin of PCV3.


Introduction
PCV3 is an emerging circovirus which spread among a wide range of species and several syndromes were caused by this virus in pigs (Kedkovid et al. 2018). To date, PCV3 has spread worldwide with highly positive rate (Bera et al. 2020;Kwon et al. 2017;Yuzhakov et al. 2018). PCV3 is a nonenveloped, single-stranded circular DNA virus, which is the member of genus Circovirus in the family Circoviridae (Palinski et al. 2017;Phan et al. 2016). The length of genome of PCV3 is 2000 nucleotides containing three open reading frames (ORFs) (Saraiva et al. 2018). ORF1 and ORF2 are two major open reading frames that encoded replicase (Rep) and capsid (Cap) proteins in opposite directions, respectively. Replicase protein is associated with virus replication and capsid protein is the major structural protein containing immunologically epitopes associated with the viral entry and neutralization (Klaumann et al. 2018;Mankertz et al. 2004).
The research of codon usage bias is an important field, which can be of benefit to molecular biology or genetics, such as determining the origin and evolution of species and predicting the gene expression level (Butt et al. 2014;Yu et al. 2021a). Although several previous studies reported the codon usage bias of other Circovirus, such as PCV1, PCV2 (Chen et al. 2014) and duck circovirus (Xu et al. 2015), there are few reports on PCV3. Here, a comprehensive codon usage analysis using replicase and capsid genes obtained from 590 PCV3 complete sequences was performed to explore the mechanism of codon distribution and evolution of the virus. Xianglong Yu and Kuipeng Gao have contributed equally to this work.

Sequence data
The complete genomic sequences of all 590 PCV3 isolates were obtained from the GenBank database (http:// www. ncbi. nlm. nih. gov) until 10 February 2021. Some strains that without the collection date, country or host were excluded. Four genotypes of PCV3, including PCV3a-1, PCV3a-2, PCV3a-IM and PCV3b, were distinguished by unique epitope sequences and amino acid sites 122, 320, 323, 373 and 446 (Li et al. 2018a). The detailed information of the selected PCV3 strains was provided in Supplementary Table S1. The individual Rep and Cap genes were cut from every complete genomic sequence for next analysis.

Recombination and phylogenetic analysis
Potential recombination events were detected using RDP4 software by the BOOTSCAN, 3SEQ, CHIMAERA, MAX-CHI, GENECONV, SISCAN, and RDP programs (Martin et al. 2015). The type of sequences was set to circular and the P value was set to 0.05. Genomes were considered to be recombinants if a recombination event was supported by at least three of the above all algorithms (P value < 0.01) or the recombination score was above 0.6 (Liu et al. 2019;Wang et al. 2017). All Rep and Cap coding sequences were aligned by ClustalW and phylogenetic analysis was carried out by the neighbor joining (NJ) method with MEGA7 software.

Nucleotide composition analysis
The frequencies of A, U, C, G and G + C content were analyzed using Bio-edit. The frequency of nucleotide compositions at the third position (A3s, C3s, U3s and G3s) of synonymous codon was calculated by CodonW 1.4.4. GC content at third codon positions (GC3s) of synonymous codon was also computed by CodonW. All detailed information was shown in Supplementary Table S2 and S3.

Relative synonymous codon usage (RSCU)
RSCU value represents a ratio between the observed frequency of one synonymous codon and expected frequency (codons for one amino acid used equally), which is the important measure to standardize the codon usage bias. The equation of RSCU index is as follows: where Xij is the number of the ith codon for the jth amino acid, and n i represents the degenerate number of a RSCU = X ij ∑ n i j X ij n i synonymous codon. The RSCU value of 1 represents no codon bias and equal usage for the amino acid. The RSCU value > 1.0 shows positive codon usage bias while the RSCU value < 1.0 shows negative codon usage bias. Especially, if RSCU is higher than 1.6 or less than 0.6, it is considered to be over-represented or under-represented (Yu et al. 2021b). RSCU value of each codon was computed by CodonW and the RSCU values of the host (Swine) were downloaded from the codon usage database (http:// www. kazusa. or. jp/ codon/).

Effective number of codon (ENC) and ENC-plot analysis
The degree of codon usage bias can be quantified by the ENC analysis (Liu et al. 2010). ENC values is used to quantify and reflect the extent of preference of synonymous codons. ENC values is an absolute measure which ranges from 20 to 61 (He et al. 2019). The value of 20 indicates the codon bias is at a maximum, while the value of 61 indicates no bias. In general, if the ENC value ≤ 35, it is considered that the coding sequence has a significant codon usage bias. ENC values of Rep and Cap genes of each PCV3 sequence were calculated using codonW program. ENC-plot analysis was used to detect the influencing factors of codon usage variation. ENC-GC3s plot was generated using GraphPad Prism 8. Expected ENC values for each GC3s was calculated using the following equation: where s is the number of the GC3s value. If the codon usage is only constrained by the mutation pressure, points will be on or around the expected curve. Otherwise observed ENC values will lie far lower than the expected curve if several factors constrain the codon usage (Bera et al. 2017).

Principal component analysis (PCA)
PCA is a multivariate statistical method, which is performed by analyzing the relationship between variables and samples to identify major variation trends (Nasrullah et al. 2015). PCA was performed with SPSS software (Version 22) to explore the trends of codon usage pattern of Rep and Cap genes among different PCV3 strains. In detail, the RSCU values of two genes in PCV3 strains were distributed into a 59-dimensional vector corresponding to the 59 synonymous codons (excluding the codons of AUG, UGG and the three stop codons), and then they were transformed into uncorrelated variables (principal components). Among them, the first two axes are accounting for most of the component influencing the codon usage variation among genes. PCA ENC expected = 2 + s + 29 s 2 + (1 − s) 2 plots were constructed with the first two axes and the figures were drawn by Graph Pad Prism 8.0.

General average hydropathicity (Gravy) and aromaticity (Aroma)
Gravy and Aroma are two major indices on translation and natural selection and the values respectively show the frequencies of hydrophobic and aromatic amino acids (Xu et al. 2015). CodonW 1.4.4 program was used to calculate the two values.

Neutrality plot analysis
Neutrality plot analysis was performed to explore the effects of natural selection and mutation pressure on codon usage by plotting the GC12s against GC3s (Yu et al. 2021b). Each point represented Rep or Cap gene of an independent PCV3 strain and the regression line was plotted. If the regression curve lay near the diagonal (slope = 1), it was considered that mutation pressure was the dominant force for the codon usage bias with weak external selection pressure. Alternatively, natural selection was the main role in shaping codon usage if the slope of the regression curves tended to 0. The figure of neutrality plot was drawn using Graph Pad Prism 8.0.

Correlation analysis
All correlation analyses were performed by Karl Pearson's method among the nucleotide compositions, codon compositions, Gravy, Aroma or principal axes.

Recombination and phylogenetic analysis
Four recombination events of complete genomic sequences, containing MH229786, MF318448, MT183690 and MF405276, were detected by the RDP software. Take out those recombinant isolates and surplus coding sequences were carried on to the next analysis. Phylogenetic analyses showed that Rep genes in PCV3a genotypes were dispersed with no clear clusters while corresponding sequences in PCV3b clustered into two groups (Fig. S1) and Cap genes clustered into distinct clades according to different genotypes (Fig. S2).

Nucleotide compositions of the Rep and Cap genes
The nucleotide content and codon usage composition of coding sequences of Rep and Cap genes were calculated.

RSCU analysis and ENC analysis
To determine the codon usage bias of Rep and Cap genes and the effect of the host (swine), RSCU values of 59 synonymous codons were calculated and compared with the RSCU values of swine (Table 1). The results indicated that ten preferred codons were demonstrated to be common between swine and Rep or Cap gene in PCV3. Among them, three were commonly used between the host and both two genes. Nine codons of Rep gene were over-represented (mean RSCU value > 1.6) and fourteen codons were underrepresented (mean RSCU value < 0.6). Meanwhile, up to twelve codons of Cap gene were over-represented and twenty-two codons were under-represented. Moreover, the number of A/U-ended preferred codons was the same as G/C-ended for both Rep

ENC-plot analysis and correlation analysis
ENC-GC3s plot was generated to investigate the mutational pressure in shaping codon usage bias. As shown in Fig. 1, all points were lower than the standard curve no matter Rep or Cap gene. To determine the effect of mutational pressure on the codon usage, the analyses between ENC value and each nucleotide content were calculated and indicated remarkable correlations with the P values much below 0.01 (Table 2). Furthermore, the correlation analyses between nucleotide composition (A, T, G, C, and GC) and codon compositions (A3s, T3s, C3s, G3s, and GC3s) showed that most of them had a significant correlation between each other except for U with A3s content (Rep) and C with A3s content (Cap) ( Table 2).

Principal component analysis (PCA)
The principal component analysis was carried out on the RSCU values to detect the variations of codon usage and the PCA plot of the first and second axes was drawn against each other according to the different countries. As shown in Fig. 2a-d, all points divided by years and continents were dispersed and no clear trend could be observed. But, points of different genotypes were clustered into two groups (3a, 3b) for Rep gene (Fig. 1e) and three groups (3a-1and 3a-IM, 3a-2, 3b) for cap gene (Fig. 1f), which were also consistent with the genotypic classification based on predicted epitopes and amino acid analysis. Subsequently, correlation analyses between the first two axes and nucleotide compositions showed that most correlations were significant (Table 3).

The role of natural selection in the codon usage bias of Rep and Cap genes
To determine the forces of natural selection, the correlation analyses were performed for evaluating the relationship between Gravy, Aroma and A, C, G, U, GC, GC3s. As shown in Table 4, the majorities of correlations were significant whereas several correlations with Aroma of Cap gene were weak.

Neutrality plot analysis
The neutrality plot was constructed to determine the main factor shaping the codon usage pattern of Rep and Cap genes (Fig. 3). The results showed a slight negative correlation between the value of GC12s and GC3s (r = −0.39, P < 0.0001   for Rep gene, suggesting that natural selection was the main force while mutation pressure was a minor force for the codon usage pattern of PCV3.

Discussion
Porcine circovirus type 3 (PCV3), first recognized in USA in recent years, is an important pathogen for swine with severe dermatitis and nephropathy syndrome (Palinski et al. 2017). Previous phylogenetic analyses showed the PCV3 had genetic diversity (Chen et al. 2019;Saraiva et al. 2018;Tan et al. 2020), the corresponding analysis of this study also proved this. Although the phylogenetic data of PCV3 could help us to deeply understand this virus, a systematic analysis for the codon usage pattern is still needed. A previous study for host adaptability partly reflected the codon usage bias of PCV3 complete genomes with 52 strains collected up to 2017 (Li et al. 2018b).
Considering many new PCV3 isolates were reported in the past four years and the codon usage pattern of Rep and Cap genes was still a gap, a comprehensive analysis was performed to explore the mechanism of codon distribution and procedure of evolution of PCV3. In this study, several analyses were used to standardize the codon usage bias. Nucleotide composition demonstrated the codon usage bias existed in PCV3 genomes. RSCU analysis showed that more than half of all codons were over-represented or under-represented, which further revealed that the codon usage bias appeared and the virus had a relatively stable nucleotide composition of synonymous codons. Moreover, about half of preferred codons were common between host and PCV3 Rep or Cap genes, highlighting a coincident and antagonistic codon usage patterns compared with host, which was also consistent with the results of the previous study (Li et al. 2018b). For ENC analysis, all ENC values were higher than 35 which revealed a slight low codon usage bias for two genes. Significantly, the ENC mean value of Rep was higher than that of Cap, demonstrating that codon usage bias for Rep was more stable, which also reflected the phenomenon that mutations mainly existed in Cap. PCA is a method which can identify major variation trends. As shown in Fig. 2, points of different years or continents were dispersed on the plot but points divided by genotype were clustered relatively, reflecting the high genetic stability of PCV3 genomes.
Mutation pressure and natural selection are thought to be the two most important factors for codon usage bias. ENC-GC3s plot showed that all of the points were below the standard curve, indicating that mutational pressure and other factors both played a role in codon usage of PCV3. Particularly, points divided by continent were dispersed on the plot but points divided by hosts were relatively concentrated except swine, which implied that PCV3 originated from swine. The neutrality plot analysis is a widely used method to explore the effects of natural selection and mutation pressure. The results revealed that the relative constraint (natural selection) was 89.7% for Cap gene and 98.6% for Rep, suggesting that natural selection was the main force for the codon usage pattern of the two genes and the contribution of mutation pressure for Cap gene was greater than that for Rep. In conclusion, a low level of codon usage bias was detected in Rep and Cap genes of PCV3, and natural selection was the primary force impacting the codon usage. The results provided the important basic data for better understanding the evolution and potential origin of PCV3.

Supplementary Information
The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s11259-021-09816-0. Data availability All data included in this study are available on request to the corresponding author.

Declarations
Consent for publication All authors gave their consent for research publication.