PbGP43 Genotyping Using Paraffin-Embedded Biopsies of Human Paracoccidioidomycosis Reveals a Genetically Distinct Lineage in the Paracoccidioides brasiliensis Complex

Paracoccidioidomycosis (PCM) is a systemic mycosis caused by a group of cryptic species embedded in the Paracoccidioides brasiliensis complex and Paracoccidioides lutzii. Four species were recently inferred to belong to the P. brasiliensis complex, but the high genetic diversity found in both human and environmental samples have suggested that the number of lineages may be higher. This study aimed to assess the 43-kilodalton glycoprotein genotypes (PbGP43) in paraffin-embedded samples from PCM patients to infer the phylogenetic lineages of the P. brasiliensis complex responsible for causing the infection. Formalin-fixed, paraffin-embedded (FFPE) tissue samples from patients with histopathological diagnosis of PCM were analyzed. DNAs were extracted and amplified for a region of the second exon of the PbGP43 gene. Products were sequenced and aligned with other PbGP43 sequences available. A haplotype network and the phylogenetic relationships among sequences were inferred. Amino acid substitutions were investigated regarding the potential to modify physicochemical properties in the proteins. Six phylogenetic lineages were identified as belonging to the P. brasiliensis complex. Two lineages did not group with any of the four recognized species of the complex, and, interestingly, one of them comprised only FFPE samples. A coinfection involving two lineages was found. Five parsimony-informative sites were identified and three of them showed radical non-synonymous substitutions with the potential to promote changes in the protein. This study expands the knowledge regarding the genetic diversity existing in the P. brasiliensis complex and shows the potential of FFPE samples in species identification and in detecting coinfections.


Introduction
Paracoccidioidomycosis (PCM) is a severe deep systemic mycosis, sometimes fatal, difficult to diagnose in its early stages, and can remain latent for up to 40 years. The illness sometimes manifests as a selflimited pulmonary infection, which may remain as a latent or lifelong infection; on the other hand, upper respiratory tract paracoccidioimycosis is a slowly progressive disease which do not heal spontaneously. but PCM can also result in a disseminated disease affecting lymph nodes, liver, spleen, gut, bones, joints, and meninges, and that may be refractory to treatment [1]. PCM is endemic in Central and South America areas and about 80% of the cases have been reported in Brazil [2,3]. However, the fact that PCM takes years to develop and its notification is not mandatory by the Ministry of Health makes it challenging to track and determine the exact number of cases in Brazil. Studies reported that approximately 10 million people could be infected and that 2% in endemic areas could develop the disease [4,5]. Among the systemic mycoses, PCM presents the highest mortality rate, reaching the ratio of 1.65 deaths per million inhabitants [6]. A recent study that analyzed available data on fungal infections, obtained from autopsy reports in Brazil between 1930 and 2015, corroborated that PCM was the most frequent infection in this period [7].
Paracoccidioides species are the etiological agents of PCM, comprising the Paracoccidioides brasiliensis complex and Paracoccidioides lutzii [8][9][10][11]. The infection occurs by inhalation of conidia from the environment. For the disease development, it is essential that conidia (infective form) develop as yeast cells (pathogenic form) in the pulmonary alveoli, an event that is dependent on the temperature increase once inside the host [12]. There is no evidence of human-to-human transmission [13]. Paracoccidioides brasiliensis has been rarely isolated from soil, but isolates have been successfully obtained from armadillos (Dasypus novemcinctus), which are considered important natural reservoirs [14,15]. Some manifestations of the disease were also observed in dogs [16,17]. Since the incidence in endemic areas is mainly associated with deforestation, armadillo hunting, and agriculture practices, PCM becomes an increasingly important public health subject insofar as deforestation to agricultural expansion affects more people [3,18].
The P. brasiliensis complex comprises four currently recognized phylogenetic species based on multilocus analyses that included microsatellites, mitochondrial and nuclear loci [19]. More recent studies using whole genome typing also support the presence of those four species that are reciprocally monophyletic. Paracoccidioides brasiliensis sensu stricto (former S1 phylogenetic group, and recently divided into two cryptic populations, S1a and S1b) is spread in Brazil, Argentina, Bolivia, Paraguay, Peru and Venezuela; P. americana (former PS2) is found in Venezuela and Brazil; P. restrepiensis (former PS3) is found mainly in Colombia, being recently also identified in Argentina, Brazil, Venezuela, Peru; and P. venezuelensis (former PS4), described as occurring exclusively in Venezuela, but recently detected in Brazil [8,[19][20][21][22][23][24][25]. Nevertheless, the high genetic diversity that has been found in both human and environmental samples suggests that the number of lineages in the complex may be higher.
The main antigenic/diagnostic component described in P. brasiliensis is a glycoprotein of 43 kDa (gp43) [26]. This molecule is secreted into the exocellular medium and contains a single oligosaccharide chain. The gp43 gene was the first to be fully characterized in P. brasiliensis, being named PbGP43. The open reading frame is within a DNA fragment of 1,329 base pairs, with two exons interrupted by an intron of 78 nucleotides [27]. Early studies have revealed a high level of polymorphism in the PbGP43 sequence among P. brasiliensis isolates, most of them located at the second exon [20,28]. It was the most informative gene compared with other nuclear genes studied in P. brasiliensis, showing the highest occurrence of non-synonymous substitutions and evidence of positive selection [8,20,29]. Besides its diagnostic relevance, PbGP43 is also a vaccine candidate against P. brasiliensis [30], specifically its P10 peptide holding a protective T-cell epitope, which can induce protective immune responses in vitro and in vivo, promising therapeutic alternatives to combat PCM [31].
One of the main obstacles to the reliable diagnosis of PCM, especially in its early symptoms, is that it can be misinterpreted with other mycoses. Clinical profile and characteristic features can be easily detected in the late stage based on serology, histopathology, and culture. According to Guarner and Brandt [32], histopathology is still considered a faster and inexpensive way of diagnosing fungal infections. However, biopsies are the only material available in some cases, so it is highly relevant to improve the molecular assays using this type of material. Specifically, polymerase chain reaction (PCR)-based strategies are additive to the histopathology findings because they can define the etiological agent of fungal infection at species or genus level [32]. Although the PCR is sensitive and specific for identification of P. brasiliensis in various types of samples, as far as we know, few research groups have carried out analyzes using formalin-fixed, paraffin-embedded (FFPE) tissue in human biopsies [17,[33][34][35][36][37], mainly because of the limitations of amplifying DNA from this type of sample. It is stated that factors such as the size of the fragment to be amplified, the fixation technique, the time of the material storage, and the use of different DNA extraction kits and protocols may affect the efficacy of the method when applied in paraffinembedded biopsies [38].
Molecular studies considering the P. brasiliensis complex are extremely important for understanding the extent of genetic diversity, their geographic distribution, and different clinical manifestations. In this context, this study aimed to assess genotypes of the PbGP43 gene in FFPE tissue samples from PCM patients to infer the phylogenetic lineages embedded in the P. brasiliensis complex responsible for causing the infection.

Samples
We analyzed 31 tissue samples stored as FFPE blocks from 20 patients with histopathological diagnosis of PCM. The period of sample collection was from 1998 to 2014. Tissue specimens were fixed in 10% buffered formalin and embedded in paraffin blocks. DNA extraction and purification procedures were performed using the QIAamp TM DNA Mini Kit (QIAGEN, Hilden, Germany), according to the manufacturer's instructions.
To assess the quality of the extracted DNA, a PCR was performed with primers for the human b-globulin housekeeping gene (b1: 5 0 -CAA CTT CAT CCA CGT TCA CC-3 0 ; b2: 5 0 -GAA GAG CCA AGG ACA GGT AC-3 0 ) [39]. The samples were subjected to PCR in a final volume of 25 ll, containing 40 ng of the genomic DNA, 2.0 lM of each primer, and 12.5 ll of PCR Master Mix 2X (Promega, Madison, USA), with the remaining volume made up of sterile distilled water. PCR amplifications were performed in a thermal cycler GeneAmp (PCR System 9700), in the following conditions: an initial 5-min denaturation step at 94°C coupled to a repeating cycle at 94°C (1 min), 55°C (1 min), and 72°C (1 min) for 40 cycles, followed by a 7-min completion step at 72°C. PCR products were visualized with ethidium bromide using 1.5% agarose gel electrophoresis.

PbGP43 Amplification and DNA Sequencing
Each DNA sample was used as a template for PCR amplification of the PbGP43 gene, using a nested PCR method with the two primer pairs [40] listed in Table 1. The samples were subjected to a first PCR round in a final volume of 25 ll, containing 40 ng of the genomic DNA, 2.0 lM of each primer (forward primer Para I and reverse primer Para II), and 12.5 ll of PCR Master Mix 2X (Promega, Madison, USA), with the remaining volume made up of sterile distilled water. The PCR protocol was carried out with an initial 5-min denaturation step at 94°C coupled to a repeating cycle at 94°C (30 s), 55°C (30 s), and 72°C (30 s) for 35 cycles, followed by a 5-min completion step at 72°C.
The products of the first PCR were used as templates in a second round of PCR with the inner primer set comprising the forward primer Para III and the reverse primer Para IV (Table 1). The reaction mixture (25 ll) consisted of 1 ll of the amplification products of the first reaction, 2.0 lM of each primer, and 12.5 ll of PCR Master Mix 2X (Promega, Madison, USA), with the remaining volume made up of sterile distilled water. The nested PCR conditions were 94°C for 5 min, followed by 40 cycles at 94°C (30 s) and 72°C (1 min), and one final cycle of 72°C (5 min). PCR amplifications were performed in a thermal cycler GeneAmp (PCR System 9700). In both reactions, DNA from Histoplasma capsulatum (closely related to Paracoccidioides spp.) and human (no PCM carrier) were used as negative controls. PCR products were visualized with ethidium bromide using 1.5% agarose gel electrophoresis.
PCR products of the second PCR were purified from the agarose gel using the Pure LinkÒ Quick Gel Extraction Kit (InvitrogenÒ, Carlsbad, USA). DNA sequencing was performed using the DYEnamic TM ET Dye Terminator Kit (GE Healthcare, Piscataway, USA), with Thermo SequenaseÒ II DNA polymerase (ThermoFisher Scientific, USA), in the MegaBACE 1000 capillary sequencer. Sequences on both strands were analyzed using the software Bioedit [41], and the sequencing quality was accessed by the Chromas program (http://technelysium.com.au/wp/chromas/).

Haplotype Network, Phylogenetic Inferences, and Protein Structure Analyses
All sequences obtained in the present study were aligned to other PbGP43 sequences available in the GenBank database (Supplementary Material 1) with MAFFT v.7 [42], using default settings. Regarding the sequences coming from the database, we considered for analysis only the gene region amplified in this study. We used DnaSP v.6 [43] to create a genotype network file (Roehl data file). The haplotype network was generated by the median-joining method [44] with Network v.10.2 (http://www.fluxus-engineering.com/ ), using the criterion connection cost, epsilon 0, and the MJ square option active.
We inferred the phylogenetic relationships among the sequences from the same alignment used to generate the haplotype network. With the software MEGA-X [45], we estimated the best-fitting nucleotide model of substitution and used this information to construct a phylogenetic tree by the Maximum Likelihood (ML) method. The robustness of branches was assessed by bootstrap analysis of 1.000 replicates [46].
We also investigated, using the TreeSAAP v.3.2 software package [47], whether amino acid replacements found in the haplotypes can alter physicochemical properties in the respective proteins. For that, we randomly selected one sequence from each haplotype (Supplementary Material 2) and considered the significant positive z-scores in categories six to eight (p \ 0.05), which are the most extreme categories of structural or functional changes.

Results
Using nested PCR, we were able to successfully amplify a 196-bp fragment of PbGP43 exon 2 from 23 FFPE samples out of 31 obtained from 20 PCM patients (74.2% of the total) ( Table 2). All samples were also used as template for amplification of the human b-globulin housekeeping gene (data not shown), yielding the expected amplicon of 268 bp in 25 of the samples, including all 23 yielding PbGP43 amplification. Two samples were amplified only for the b-globulin gene, and six samples did not yield any  TAG AAT ATC TCA CTC CCA GTC C  355   Para II  TGT AGA CGT TCT TGT ATG TCT TGG G   Para III  GAT CGC CAT CCA TAC TCT CGC AAT C  196  Para IV GGG CAG AGA AGC ATC CGA AAT TGC G Bialek et al. [40] PCR: polymerase chain reaction, PbGP43: gene coding for the 43-kilodalton glycoprotein from Paracoccidioides brasiliensis, bp: base-pair product. We were successful in amplifying fungal DNA from different FFPE tissue samples: skin (n = 8), liver (n = 1), oral mucosa (n = 5), tonsil (n = 1), salivary gland (n = 1), subcutaneous cyst (n = 1), skullcap (n = 1), lymph node (n = 1), oropharynx (n = 1), lung (n = 1), and floor of the mouth (n = 1). Another point worth mentioning is the gender ratio: among the 23 samples amplified for the PbGP43 gene, 21 samples were from male patients and two from female patients, corresponding to a male:female ratio of 10.5:1. We sequenced all 196-bp fragment from the PbGP43 exon 2 resulting from the nested PCR. To maintain the translation frame of the fragment, we excluded the first nucleotide from the sequences before the analysis. The 23 FFPE sequences were aligned to other 61 PbGP43 sequences (Supplementary Material 1), including representatives of the four species belonging to the P. brasiliensis complex [19]. The excluded nucleotide consisted of a guanine in all sequences analyzed, including sequences from the database, and therefore, its exclusion did not interfere with the results obtained. The nucleotide alignment showed 163 invariable sites, 27 singleton variable sites (containing two variants), and five variable parsimony-informative sites (83.59%, 13.85%, and 2.56% of the sites analyzed, respectively). Nucleotide diversity was estimated as 0.01287. The variable parsimony-informative sites were located at positions 1, 124, 136, 172 and 194, of the nucleotide alignment (Table 3).
We identified seven haplotypes in our dataset (Supplementary Material 3), six of them belonging to the P. brasiliensis complex (Fig. 1A). Haplotype 1 (H1), Haplotype 2 (H2), Haplotype 3 (H3), and Haplotype 6 (H6) include representatives of the P. brasiliensis sensu stricto (S1), P. restrepiensis (PS3), P. venezuelensis (PS4), and P. americana (PS2), respectively. Haplotype 4 (H4) comprises only FFPE samples, and Haplotype 5 (H5) comprises three database sequences annotated as P. brasiliensis, but that did not group with any species of the complex. Haplotype 7 (H7) comprises the outgroup P. lutzii. Besides, our results indicated another haplotype belonging to the P. brasiliensis complex represented by the median vector (mv) in the haplotype network. Haplotype diversity was estimated as 0.7008. The ML phylogenetic tree (Fig. 1B) was built from the same nucleotide alignment used to infer the haplotype network and considering the Kimura 2-parameter model as the best-fitting model. The tree was rooted with P. lutzii. Our results indicated P. americana as the most basal species of the complex, followed by the H5 sequences and a polytomy involving P. brasiliensis sensu stricto, P. restrepiensis, P. venezuelensis, and H4. Phylogenetic inferences results were consistent with results of the haplotype network.
According to our results, 14 of the 23 FFPE samples studied were clustered in H4 (named FFPE clade). The other nine samples were clustered in H6 (P. americana clade). From the 14 samples belonging to the FFPE clade, 13 were collected in the São Paulo State and one in Minas Gerais State (Table 2). On the other hand, the P. americana clade was composed of samples collected not predominantly in the São Paulo State (five samples) but also collected in the states of Rio de Janeiro (one sample), and Rio Grande do Sul (three samples).
We identified a coinfection in Patient 3, the patient for which more than one sample could be amplified for the PbGP43 gene ( Table 2). Among the four sequences obtained from this patient, three (GR5, GR6, and GR7 from skin, liver, and salivary gland, respectively) were clustered in the FFPE clade. The other sample (GR56, also collected from the skin but in a different area) was clustered in the P. americana clade.
The five variable parsimony-informative sites identified among the haplotypes consisted of nonsynonymous changes ( Table 3). One of them, the site at the amino acid position 1, is part of the P10 T-cell epitope. This was the only amino acid change detected in the P10 peptide sequence among the P. brasiliensis lineages, being the inner core of the P10 peptide identical among them [31]. We analyzed whether these non-synonymous changes promote structural and functional changes in the proteins. We observed that three sites showed radical non-synonymous changes. Substitution at the amino acid position 42, identified in the haplotypes H1, H2, H3, and H4, is associated with a change in isoelectric point (zscore = 8; p \ 0.001). Substitution at the amino acid position 58, found in H1, is associated with the property alpha-helical tendencies (z-score = 6; p \ 0.05). Substitution at the amino acid position 65, found in H3, is associated to three properties: alpha-helical tendencies (z-score = 6; p \ 0.05), power to be at the N-terminal (z-score = 8; p \ 0.05), and turn tendencies (z-score = 7; p \ 0.05).

Discussion
In the present study, we identified phylogenetic lineages of the P. brasiliensis complex from the sequencing of the nested PCR products of FFPE samples. The successful use of this type of material for PCM diagnosis has been demonstrated in several clinical investigations to detect the fungus DNA as a marker of infection, highlighting the value of these samples for molecular diagnosis and genotyping [17,[33][34][35][36][37]. These 23 sequenced samples were collected in four Brazilian states located in the Southeast and South regions, which are recognized as the mainly endemic areas of PCM in the country [3]. The majority of the samples (91.3%) were obtained from male patients. It is in agreement with the previously described in a study that analyzed 584 PCM patients (575 of them from the Southeast region of Brazil), in which 84.2% of the samples were from male patients [48]. A predominance of male patients is characteristic of PCM, irrespective of phylogenetic species [49,50], and the reasons include the fact that circulating estrogens in women can inhibit the transformation of the inhaled conidia into yeast cells, and the greater proportion of men involved in agricultural activities in endemic areas [3,51]. The alignment of our sequences with other PbGP43 sequences available in the GenBank database showed the presence of five parsimony-informative sites in the gene region analyzed. We analyzed a fragment of the second exon of PbGP43 because it is the most polymorphic region among P. brasiliensis genes already studied [20,28,52]. A highly polymorphic gene region (more informative) is critical in FFPE samples, since the extracted DNA is more prone to be fragmented, challenging the amplification of long PCR products [53]. The limitations in getting DNA with good quality from FFPE tissue biopsies are formalin fixation and paraffin embedding procedures, resulting in DNA damage. The difficulty of amplifying DNA from FFPE samples is well discussed and explored in the literature [32,54,55], but despite this difficulty, only six samples did not show any amplification products, which indicates that for most of the samples, we obtained a good quality DNA from the FFPE blocks.
Sequences were assigned to seven haplotypes, six of them belonging to the P. brasiliensis complex. The other haplotype comprises P. lutzii, which was first considered as P. brasiliensis, but later proposed as a new species in the genus Paracoccidioides [8]. Our results also indicated the existence of an unsampled or extinct haplotype in the P. brasiliensis complex, as represented by the median vector in the haplotype network. We found a relatively high value of haplotype diversity and a low value of nucleotide diversity, a pattern commonly observed in recently diverged species. Indeed, that is the case for the P. brasiliensis complex, since it has been demonstrated that the divergence times within P. brasiliensis lineages range between 23,000 and 575,000 years ago [21].
Each recognized species of the P. brasiliensis complex was assigned to a distinct haplotype (H1, H2, H3, and H6), but sequences of two haplotypes (H4 and H5) could not be assigned to any species of the complex. The ML phylogenetic tree supported the haplotype network results and indicated that P. americana was the most basal species of the complex, in agreement with previous studies [19,21,56]. The expected phylogeny for the other taxa in the complex is P. restrepiensis and P. venezuelensis as sister species, with P. brasiliensis sensu stricto as the sister group of these dyad [19]. However, we found H5 located between the P. americana clade and a polytomy involving the other haplotypes. The isolates that comprise H5 were obtained from human patients in Costa Rica, and as far as we know, only Takayama et al. [57] reported a phylogeny considering these isolates. They analyzed a different region of the PbGP43 gene and also obtained an exclusive clade for these isolates, which prevented the assignment of a phylogenetic species. We, therefore, considered these isolates as an indeterminate species of the P. brasiliensis complex.
Due to the limitations in getting DNA with good quality from FFPE tissue biopsies, we could not obtain long PCR fragments. Even considering the most polymorphic PbGP43 gene region (the exon 2), only five nucleotide substitutions were observed among the P. brasiliensis lineages in the gene region analyzed. Consequently, the sequence data do not provide sufficient information to resolve the relationships among some lineages in the phylogenetic tree, which caused the observed polytomy and relatively low bootstrap values. In these cases, haplotype networks have the potential to be graphically more informative, although we also have observed a loop involving H2, H4 and H5. Despite that, the substitutions found in the sequenced gene region were sufficient to identify the P. brasiliensis lineages previously described, and even enabled us to identify a new lineage. FFPE samples were collected in an area of sympatry between P. brasiliensis sensu stricto and P. americana. The former is usually reported as the most frequent species, associated with most PCM cases, while the latter is found at a considerably lower frequency [10,20,56,58,59]. Our results showed that nine out of 23 FFPE samples (39.1%) were assigned to P. americana, a higher proportion than expected. In comparison, a recent study analyzing samples collected in the southwest region of Brazil reported six out of 39 (15.4%) samples assigned to P. americana [59]. The other 14 FFPE sequences were grouped in H4 and could not be assigned to any species of the complex, as observed for H5. Unexpectedly, no FFPE samples were assigned to the species P. brasiliensis sensu stricto. Since H4 comprises only FFPE samples, we named the clade that contains these sequences as FFPE clade.
Our findings indicated genotypes not covered by the four species currently recognized, suggesting a higher genetic diversity in the P. brasiliensis complex than that described so far. Similar findings had already been described for the genus Paracoccidioides in a study that reported isolates obtained from environmental samples clustering apart from the clinical referenced strains of P. lutzii and P. brasiliensis [60]. By analyzing the rRNA universal fungal region ITS1-5.8S-ITS2, the authors found two new genotypes in soil samples and one in aerosol samples that indicated a higher genetic variation than the previously reported for the ITS in Paracoccidioides. Our study expands the knowledge regarding the genetic diversity in Paracoccidioides by finding new genotypes inside the P. brasiliensis complex and considering FFPE human tissue samples. Moreover, the higher genetic diversity found in FFPE and environmental samples, compared with isolates maintained in culture collections, may suggest that some lineages of Paracoccidioides are not successful in culture adaptation. Further studies are needed to evaluate this issue.
We obtained four PbGP43 sequences from the same patient (named Patient 3), an initially healthy male, 22 years-old, ex-rural worker living in Botucatu (São Paulo State, Brazil). He was hospitalized with a diagnosis of pneumonia, whose symptoms lasted for over one month and aggravated in the last days before hospitalization. He showed multiple cutaneous lesions, fever, pneumonia, and was later diagnosed with the acute juvenile form of PCM, established by both the microscopic findings (histopathology) and skin test. Serology was negative for HIV and he died a week after being admitted to the hospital. It is noteworthy that we detected a coinfection in Patient 3, for which three samples were clustered in H4, and one sample was clustered with P. americana. Coinfection by Paracoccidioides has already been reported in armadillos [61], and a recent study reported a case of coinfection in humans, showing P. brasiliensis sensu stricto and P. lutzii in the same patient [37]. Here, for the first time, we report a coinfection involving two different lineages of the P. brasiliensis complex. Pinheiro et al. [37] hypothesize that the use of non-selective media most likely favors the isolation of a faster-growing or a major representative genotype in Paracoccidioides, while FFPE samples have a higher potential to detect coinfections. Likewise, our results reinforce the high potential of FFPE samples in Paracoccidioides coinfection detection.
The PbGP43 gene region analyzed comprises part of the P10 T-cell epitope, including its essential inner core formed by the conserved amino acid sequence HTLAIR. We found one amino acid substitution in the P10 peptide among the haplotypes H1 to H6, which corresponded to a variable parsimony-informative site. However, this substitution occurred between two amino acids with similar characteristics, and did not represent a radical non-synonymous change, according to our results. On the other hand, the essential inner core was identical among the six haplotypes comprising the P. brasiliensis complex, including H4 and H5 that could not be assigned to any recognized species. Our results corroborate the high conservation level of the P10 inner core among species in the complex and reinforce its potential to be used in the PCM treatment and possible vaccines. The gene region analyzed also includes the N-glycosylation site of this glycoprotein [28], which showed no nucleotide substitutions among the species of the complex, but showed one non-synonymous substitution in P. lutzii. This substitution prevents glycosylation, as already proven experimentally [62].
Interestingly, the five variable parsimony-informative sites identified among the haplotypes consisted of non-synonymous changes in one or more species of the P. brasiliensis complex, in relation to P. lutzii.
Genes encoding pathogenesis-related proteins, as PbGP43, are likely to evolve in response to selective pressure from the host's immune system, which explains why non-synonymous substitutions are common and have been maintained by positive selection in genes that code for virulence factors [29]. Three out of five sites showed radical non-synonymous changes, i.e., amino acid changes that can promote structural and/or functional changes in the protein.
We highlight the change from lysine (basic amino acid) to glutamic acid (acidic amino acid) observed in H1, H2, H3, and H4, associated with changes in the isoelectric point (pI). Changes related to this property are essential because the electrical charge of the protein affects its solubility. That is especially relevant in the gp43 isoforms since they are primarily secreted to the extracellular environment and bound to major histocompatibility complex (MHC) molecules [63]. Indeed, a variety of pIs have been reported in gp43 [26,28,52,64,65], which have the potential to influence immunogenicity, i.e., the ability of an antigen to induce an adaptive immune response. We also found substitutions associated with alpha-helical tendencies (in H1 and H3), power to be at the N-terminal, and turn tendencies (both in H3), affecting the protein tertiary structure.
Our data shed additional light to the genetic diversity existing in the P. brasiliensis complex and contribute to the knowledge of the geographic distribution of lineages. In addition, we confirmed PbGP43 as a remarkable informative gene for further species identification in the complex, even considering a short gene fragment. It is also important to point out the correlation between molecular and phylogenetic analyzes of Paracoccidioides found in FFPE human biopsies, which is rare in the literature and shows a high potential to detect cases of coinfection, an interesting aspect that is worth exploring in the future.