The MAT1 locus is conserved across the genus
In all the Colletotrichum species considered, the MAT1 locus was flanked by the DNA lyase (APN2) and cytoskeleton assembly control (SLA2) genes (Fig. 1). In all cases, the MAT1 locus harboured the MAT1-2-1 gene and a secondary MAT1-2 gene. The locus orientation was also highly conserved, with the MAT1-2-1 gene found directly downstream of the APN2 gene. The only exception was Colletotrichum chlorophyti, where the locus is inverted such that the MAT1-2-1 gene was instead found directly downstream from the SLA2 gene.
All of the Colletotrichum MAT1-2-1 genes encoded a protein possessing the HMG (high mobility group) box domain- the functional domain associated with this protein in all ascomycete fungi. The secondary MAT1-2 gene present at this locus had not been previously characterized or named and did not show any similarity to other previously described MAT1-2-assocaited genes. It also did not possess a recognizable functional domain. It has thus been provided with the name MAT1-2-13 in accordance with accepted MAT gene nomenclature (37). Consistent with all previously published research, the MAT1-1-1 gene was not identified in any of the Colletotrichum genomes considered in this study; neither at the MAT1 locus nor elsewhere in the genome.
The α-pheromone gene has undergone multiple, independent loss events.
The α-pheromone was identified in 18 of the 45 Colletotrichum species and both Verticillium species considered in this study (Table S1). All of the identified α-pheromone genes encoded proteins that exemplified the expected structure of the ascomycete α-pheromone factor. In this regard, they possessed a hydrophobic region and single peptide at the N-terminal. Furthermore, they harboured numerous KEX2 processing sites (KR) as well as one or more repeats of the mature, 11 aa pheromone factor (Table S1).
The gene encoding the α-pheromone was found at a fairly conserved location within the genome of these species, although multiple rearrangements had taken place at this locus (Fig. 2). The region downstream of the pheromone gene was highly conserved, with a DUF1640 domain containing protein present in all species considered. Various other genes encoding known, uncharacterized and hypothetical proteins (Fig. 2, Table S2) were also found in species that do not harbour a gene encoding for the pheromone.
In contrast to the conserved nature of the downstream region, the genes found upstream of the α-pheromone varied amongst species (Fig. 2). An integral membrane protein as well as a major facilitator superfamily transporter were present in all species considered, but their orientation and location differed for the different species complexes. The presence of a FAD binding domain containing protein was also conserved at this locus, although it had been lost in Colletotrichum orchidophilum as well as all the species in the acutatum species complex.
The ancestral state reconstruction analyses showed that the α-pheromone was likely present in the ancestor of Colletotrichum and Verticillium (Fig. 3). At least 10 independent genetic events have occurred at the α-pheromone locus during the evolutionary trajectory of Colletotrichum, each of which resulted in the loss of the α-pheromone (Fig. 4). These events included small changes such as point mutations disrupting the start codon or introducing a premature stop codon. Additionally, larger translocations, insertions and/or deletions also occurred and resulted in completely different genes being encoded at this locus.
The first predicted loss event occurred at the split between the orbiculare species complex and all other species of Colletotrichum. In all four species residing in this complex, the α-factor pheromone was absent, and instead, two genes encoding hypothetical proteins are present (Fig. 2, Fig. 4). Despite this, the locus was not remarkably different in length. There is also no significant similarity between the region encoding the α-pheromone in any of the other Colletotrichum species and those of the orbiculare complex. It is thus likely that translocation removed the α-factor pheromone gene and its immediate flanking sequences and replaced these with the two hypothetical protein genes.
The second loss event occurred within the truncatum species complex (Fig. 3). While the C. truncatum 1 and C. truncatum 3 genomes harboured a gene encoding the α-pheromone, the C. truncatum 2 (formerly Colletotrichum capsici) genome did not. In fact, in C. truncatum 2, genes encoding for the FAD-linked oxidoreductase and major facilitator transporter proteins were not present on the same contig and are flanked by other genes. Additionally, neither the α-factor pheromone gene nor the DUF1640 domain containing protein gene could be identified in the C. truncatum 2 genome. It is thus not clear how the gene was lost, but it could have been a large rearrangement or translocation coupled with a deletion.
The loss of the α-pheromone gene in Colletotrichum coccodes represented the third loss of this gene during the evolution of Colletotrichum (Fig. 3). While the locus was conserved between C. coccodes and closely related species such as Colletotrichum higginsianum that do harbour the gene, a point mutation has disrupted the start codon of the putative C. coccodes α-factor pheromone gene (Fig. 4). The start codon was mutated from ATG to ACG and would likely not support translation of the protein. Furthermore, the region downstream of the disrupted start codon did not code for any recognizable mature repeats, despite harbouring a number of potential KEX2 cutting sites. It is thus likely that this gene is undergoing pseudogenization.
A fourth loss event has occurred in Colletotrichum incanum which is accommodated in the spaethianum complex (Fig. 3). The pheromone region (from integral membrane protein to DUF1640 domain containing protein) was almost 20 kb in C. incanum and only just over 16 kb in Colletotrichum tofieldiae. Despite the much smaller locus, an intact α-pheromone gene was identified from the C. tofieldiae locus. Interestingly, the encoded protein was much smaller than that of the other Colletotrichum species, at only 91 aa long (compared to the others 191–335 aa). Furthermore, this protein harbours only a single mature pheromone repeat, compared to the two to seven repeats in the other species (Table S1). It was particularly relevant that the terminal 43 nt at the 3’ region of the C. tofieldiae α-pheromone gene were well conserved the C. incanum (Fig. 5). However, the 5’ region of this gene was not conserved between the two species and no definable start codon was present in C. incanum. It is therefore possible that there has been an insertion or translocation into the C. incanum locus that has removed the 5’ region of α-pheromone in this species.
The α-pheromone gene was also lost by species residing in the caudatum and graminicola species complexes (Fig. 3). In these species, an entirely different gene was present at this locus. In almost all of the species, a gene encoding a tetratricopeptide (TTP) protein was present, while this locus in Colletotrichum falcatum encoded a hypothetical protein (Fig. 4). Similar to the loss in the orbiculare complex species, it is probable that a translocation resulted in the deletion of the α-pheromone factor gene and the insertion of new sequences.
Further gene loss events occurred within the acutatum species complex (Fig. 3, Fig. 4). The first of these events occurred in Colletotrichum godetiae, where a single nucleotide polymorphism (SNP) lead to a nonsense mutation. The putative protein has thus been truncated to only 43 aa. The next two loss events occurred in C. fioriniae and C. acutatum and both involved the disruption of the start codon, which had been mutated to ATA and ACG, respectively. The final loss event left no genetic imprint and was likely a similar event to the loss of the gene in the caudatum, graminicola and orbiculare species complexes. Interestingly, the genes found at this locus were not conserved, and suggesting that multiple disruptions have occurred at this locus.
The α-factor pheromone flanking genes in Colletotrichum sansevieriae were identified at the ends of two different contigs that could not be joined. Therefore, a pheromone was not detected in this species. However, the genes as well as the existing intergenic regions are highly conserved between C. sansevieriae and Colletotrichum karstii, suggesting that the pheromone is likely present (Fig. 3).
The a-factor pheromone exhibits less extreme pattens of gene loss
The gene encoding the a-factor pheromone was identified in a total of 28 Colletotrichum species and both Verticillium species and was found in the same genomic location in all of them. The gene found directly upstream of the pheromone encodes a cyanate hydratase protein (CHP), while the gene directly downstream of the pheromone was eukaryotic rRNA processing protein (EBP2). In most of the 17 species where the a-factor pheromone has been lost, these flanking genes occurred next to one another in the same orientation as those that retained the pheromone gene.
The defining features of the a-factor pheromone is its short length and the presence of a highly conserved C-terminal CaaX domain. In the Colletotrichum species, the sequence of this conserved domain was either CVIL or CVVM (Table S3). Although the genes all encode short proteins, there was some variation in the size of the a-factor pheromone protein. For example, Colletotrichum musicola and Colletotrichum pluvivorum produced the longest proteins at 109 aa, while species belonging to the orbiculare species complex produced a protein of only 35 aa.
A parsimony-based ancestral state reconstruction showed that the a-factor pheromone was present in the ancestor of Colletotrichum and Verticillium, but that is has subsequently been lost as the species of Colletotrichum have evolved (Fig. 7). Although the ancestral state reconstruction showed a total of four loss events, the genetic imprints left during these losses suggests that at least eight independent deletion events have occurred. These events are similar to those that occurred for the α-factor pheromone.
The first loss of the a-factor pheromone occurred in Colletotrichum sojae (Fig. 6). Although the C. sojae locus showed high levels of similarity to this locus in other species residing in the orcidearum species complex, a 7 nt deletion in the gene-encoding region resulted in a frameshift mutation. The mutation disrupted the CaaX domain and stop codon. Therefore, any protein encoded by this putative pheromone gene would likely not function as required.
The next a-factor pheromone loss occurred in C. truncatum 2 (Fig. 6). While the gene encoding the upstream flanking gene, CHP, is present, the downstream flanking gene is unidentifiable in the C. truncatum 2 genome. This locus has thus been entirely disrupted in C. truncatum 2, leading to the loss of the a-factor pheromone as well. Similarly, no gene encoding this pheromone could be found in C. chlorophyti either, representing the third loss of this gene in Colletotrichum (Fig. 6). In this case, the up- and downstream flanking genes are conserved, yet the a-factor pheromone could not be identified in this region or elsewhere in the genome.
The loss of the a-factor pheromone gene in C. coccodes represented the fourth loss of this gene in Colletotrichum species (Fig. 6, Fig. 7). In this case, a mutation disrupted the start codon of the gene, changing the start codon from ATG to AGG. The remaining gene sequence was similar to that of closely related species but does not encode for a protein with the terminal CaaX domain.
The final pheromone loss occurred when the acutatum species complex branched off from the rest of the genus (Fig. 6). However, genetic analysis showed that at least four different deletion events have taken place in this species complex (Fig. 7). The first of these occurred in C. godetiae, Colletotrichum phormii and Colletotrichum salicis, where a disruption of a start codon was responsible for the loss of this gene. In this case, the start codon was mutated from ATG to GCG and has thus undergone at least two mutation events. Interestingly, a gene sequence similar to that of other species’ a-factor pheromones was detectable after the disrupted start codon and a CaaX domain can be identified. Although there was an alternative start codon, it was found halfway through the gene sequence. Furthermore, there was no stop codon encoded after the CaaX domain and thus any potentially translated protein would be very different from a typical a-factor pheromone.
The second loss to occur in the acutatum species complex involves the a-factor pheromone gene of C. acutatum (Fig. 7). In this case, a point mutation has introduced a premature stop codon into the gene sequence and would prematurely terminated translation of the putative protein before the terminal CaaX domain, producing a protein of only 31 aa. Interestingly, the third loss involved a similar mutation and occurred in many of the other species residing in the acutatum species complex (Fig. 7). In these species, the premature stop codon occurred at position 35. Lastly, the gene was lost in Colletotrichum simmondsii due to another disruption of the start codon, which was mutated from ATG to ACG (Fig. 7).
The orbiculare species complex is worth mentioning here although this gene has not been entirely loss in these species. However, the a-factor pheromone locus in these species has undergone a deletion of 15 bp that correspond to the first five amino acids of the a-factor pheromone (Fig. 7). An alternative start codon was available, and since no functional domain is known from the N-terminal of this protein and the C-terminal CaaX domain is still intact, this protein was likely to act as a functional pheromone. However, this would need to be confirmed experimentally.
The pheromone receptors display similar patterns of presence and absence as their pheromone factors
In a similar manner to their respective pheromones, genes encoding the pheromone receptors could be identified in the genomes of only a subset of all Colletotrichum species. In almost all cases, the presence (or absence) of a pheromone gene correlated to the presence (or absence) of a gene encoding the pheromone’s cognate receptor. The receptors, therefore, showed similar patterns of gene presence and absences as their pheromones (Fig. 3, Fig. 6). However, there were a few exceptions to this pattern.
With regards to the α-factor pheromone and its cognate receptor (preB), the exceptions included: C. truncatum 1 and Colletotrichum tofieldiae, where the pheromone was present while the receptor was absent, and C. godetiae, where the pheromone was absent while the receptor was present (Fig. 3). When the a-factor and its cognate receptor (preA) are considered, three exceptions exist. C. sojae, C. coccodes and C. truncatum 2, while harbouring an intact pheromone receptor, do not possess the corresponding pheromone’s gene (Fig. 6).
In species where the pheromone receptors were absent, these genes had been lost in similar ways to the pheromone genes. This was particularly true in the case of the α-factor receptor (preB), where disruptions to the start codon and the presence of in frame, premature stop codons were common across the genus (Fig. 8). Interestingly, these genes have also experienced small to large indel events, a few of which resulted in frameshift mutations which lead to premature stop codons. Similar genetic events could not be identified at the locus harbouring the a-factor receptor (preA), and it is thus likely that these genes were lost via translocation, insertion and/or deletion events that left no identifiable trace in the genome.
Colletotrichum species show highly varied levels of genome wide RIP
Almost all of the genomes considered in this study showed some level of RIP (Table 2), which varied greatly from 0.44% (Colletotrichum caudatum) to 56.02% (Colletotrichum trifolii). The only species that displayed no genomic evidence of RIP was Colletotrichum sublineola. In general, species residing in the same species complexes showed similar RIP profiles.
Table 1
The RIP profiles and GC content of the genomes used in this study
Species
|
RIP %
|
GC Content
|
# of LRAR
|
GC Content of LRARs
|
acutatum species complex
|
C. nymphaeae
|
0.93
|
52.75
|
942
|
30.11
|
C. simmondsii
|
3.97
|
51.73
|
26
|
24.44
|
C. costarricense
|
5.06
|
51.35
|
118
|
23.24
|
C. paranaense
|
2.11
|
52.23
|
34
|
23.55
|
C. lupini
|
19.6
|
47.58
|
879
|
26.72
|
C. tamarilloi
|
5.51
|
51.2
|
175
|
22.59
|
C. cuscutae
|
41.1
|
41.34
|
2037
|
25.99
|
C. melonis
|
2.41
|
52.15
|
51
|
24.01
|
C. abscissum
|
5.93
|
51.11
|
180
|
23.49
|
C. fioriniae
|
1.42
|
52.49
|
11
|
35.53
|
C. phormii
|
1.36
|
52.74
|
25
|
42.47
|
C. salicis
|
0.99
|
52.78
|
3
|
38.52
|
C. godetiae
|
2.89
|
52.09
|
61
|
26.97
|
C. acutatum
|
2.71
|
52.25
|
45
|
25.89
|
caudatum species complex
|
C. somersetense
|
14.57
|
50.05
|
339
|
26.41
|
C. caudatum
|
0.44
|
54.95
|
0
|
0
|
C. zoysiae
|
3.99
|
53.76
|
152
|
25.13
|
graminicola species complex
|
C. navitas
|
18.31
|
48.63
|
246
|
24.82
|
C. graminicola
|
18.09
|
48.42
|
545
|
25.15
|
C. sublineola
|
0
|
53.11
|
0
|
0
|
C. eremochloae
|
3.35
|
52.54
|
66
|
25.67
|
C. falcatum
|
10.01
|
52.78
|
230
|
27.76
|
C. cereale
|
9.9
|
52.29
|
230
|
29.04
|
spaethianum species complex
|
C. tofieldiae
|
4.8
|
52.99
|
110
|
28.12
|
C. incanum
|
4.64
|
51.9
|
117
|
33.34
|
destructivum species complex
|
C. higginsianum
|
3.27
|
54.41
|
117
|
25.89
|
C. shisoi
|
35.92
|
46.26
|
284
|
28.8
|
C. tanaceti
|
24.1
|
49.19
|
674
|
31.1
|
No assigned complex
|
C. orchidophilum
|
6.22
|
51.06
|
92
|
18.58
|
C. coccodes
|
1.31
|
53.79
|
0
|
-
|
C. chlorophyti
|
10.85
|
49.8
|
214
|
25.8
|
gloeosporioides species complex
|
C. fructicola
|
1.37
|
53.2
|
39
|
19.7
|
C. asianum
|
14.32
|
49.37
|
485
|
23.58
|
C. camelliae
|
10.76
|
50.01
|
361
|
20.9
|
C. gloeosporioides
|
2.61
|
51.98
|
107
|
19.7
|
boninense species complex
|
C. karstii
|
3.75
|
52.69
|
101
|
23.4
|
C. sansevieriae
|
11.53
|
50.74
|
38
|
34.88
|
truncatum species complex
|
C. truncatum (1)
|
3.9
|
49.73
|
62
|
27.49
|
C. truncatum (2) (previously C. capsici)
|
5.19
|
48.26
|
123
|
26.72
|
C. truncatum (3)
|
2.85
|
50.12
|
122
|
35.63
|
orcidearum species complex
|
C. musicola
|
5.72
|
54.94
|
133
|
34.87
|
C. pluvivorum
|
0.99
|
55.84
|
15
|
24.47
|
C. sojae
|
1.02
|
55.9
|
14
|
30.3
|
orbiculare species complex
|
C. sidae
|
47.37
|
37.96
|
613
|
19.62
|
C. orbiculare
|
49.53
|
36.47
|
1067
|
17.91
|
C. trifolii
|
56.02
|
35.24
|
1045
|
20.85
|
C. spinosum
|
45.32
|
38.7
|
588
|
19.36
|
Verticillium (Outgroup)
|
V. tricorpus
|
2.12
|
57.24
|
32
|
32.38
|
V. dahliae
|
2.59
|
55.34
|
24
|
28.9
|
Table 2
Genomes used in this study
Species
|
Strain a
|
Database
|
Accession Number/ Project ID b
|
Reference/Citation
|
acutatum species complex
|
C. nymphaeae
|
SA-01
|
JGI
|
-
|
N/A
|
C. simmondsii
|
CBS 122122
|
JGI
|
-
|
N/A
|
C. costarricense
|
IMI 309622
|
JGI
|
-
|
N/A
|
C. paranaense
|
IMI 384185
|
JGI
|
-
|
N/A
|
C. lupini
|
CBS 109225
|
JGI
|
-
|
N/A
|
C. tamarilloi
|
CBS 129955
|
JGI
|
-
|
N/A
|
C. cuscutae
|
IMI 304802
|
JGI
|
-
|
N/A
|
C. melonis
|
CBS 134730
|
JGI
|
-
|
N/A
|
C. abscissum
|
IMI 504890
|
JGI
|
-
|
N/A
|
C. fioriniae
|
MH 18
|
JGI
|
Project ID: 1006306
|
N/A
|
C. phormii
|
CBS 102054
|
JGI
|
Project ID: 1060996
|
N/A
|
C. salicis
|
CBS607.94
|
JGI
|
-
|
N/A
|
C. godetiae
|
CBS 193.32
|
JGI
|
Project ID: 1060998
|
N/A
|
C. acutatum
|
CBS 112980
|
JGI
|
Project ID: 1061000
|
N/A
|
caudatum species complex
|
C. somersetense
|
CBS 131599
|
JGI
|
Project ID: 1043133
|
N/A
|
C. caudatum
|
CBS 131602
|
JGI
|
Project ID: 1006149
|
N/A
|
C. zoysiae
|
MAFF 235873
|
JGI
|
Project ID: 1043139
|
N/A
|
graminicola species complex
|
C. navitas
|
CBS 125086
|
JGI
|
-
|
N/A
|
C. graminicola
|
M1.001
|
JGI
|
-
|
N/A
|
C. sublineola
|
CBS 131301
|
JGI
|
Project ID: 1006165
|
N/A
|
C. eremochloae
|
CBS 129661
|
JGI
|
Project ID: 1043109
|
N/A
|
C. falcatum
|
MAFF 306170
|
JGI
|
Project ID: 1043115
|
N/A
|
C. cereale
|
CBS 129662
|
JGI
|
Project ID: 1043097
|
N/A
|
spaethianum species complex
|
C. tofieldiae
|
CBS 168.49
|
JGI
|
-
|
N/A
|
C. incanum
|
MAFF 238712
|
JGI
|
-
|
N/A
|
destructivum species complex
|
C. higginsianum
|
IMI 349063
|
NCBI
|
LTAN00000000.1
|
(28)
|
C. shisoi
|
PG-2018a
|
NCBI
|
PUHP00000000.1
|
(59)
|
C. tanaceti
|
BRIP 57314
|
NCBI
|
PJEX00000000.1
|
(35)
|
No assigned complex
|
C. orchidophilum
|
IMI 309357
|
NCBI
|
MJBS00000000.1
|
(60)
|
C. coccodes c
|
NJ-RT1
|
NCBI
|
LECQ00000000.1
|
USDA-ARS
|
C. chlorophyti
|
NTL11
|
NCBI
|
MPGH00000000.1
|
(61)
|
gloeosporioides species complex
|
C. fructicola
|
CGMCC 3.17371
|
NCBI
|
SSNE00000000.1
|
(62)
|
C. asianum
|
ICMP 18580
|
NCBI
|
WOWK00000000.1
|
(63)
|
C. camelliae c
|
CcLH18
|
NCBI
|
JAATWK000000000.1
|
Central South University of Forestry & Technology
|
C. gloeosporioides
|
23
|
JGI
|
Project ID: 1006302
|
N/A
|
boninense species complex
|
C. karstii c
|
CkLH20
|
NCBI
|
JAATWM000000000.1
|
Central South University of Forestry & Technology
|
C. sansevieriae c
|
Sa-1-2
|
NCBI
|
NJHP00000000.1
|
(64)
|
truncatum species complex
|
C. truncatum (1) c
|
MTCC 3414
|
NCBI
|
NBAU00000000.2
|
(65)
|
C. truncatum (2) (previously C. capsici) c
|
KLC.C-4
|
NCBI
|
JAATLN000000000.1
|
KL University
|
C. truncatum (3) c
|
CMES1059
|
NCBI
|
VUJX00000000.1
|
(66)
|
orcidearum species complex
|
C. musicola c
|
LFN0074
|
NCBI
|
WIGM00000000.1
|
(66)
|
C. pluvivorum c
|
LFN00145
|
NCBI
|
WIGO00000000.1
|
(66)
|
C. sojae c
|
LFN0009
|
NCBI
|
WIGN00000000.1
|
(66)
|
orbiculare species complex
|
C. sidae
|
CBS 815.97
|
NCBI
|
QAPF00000000.1
|
(67)
|
C. orbiculare
|
MAFF 240422
|
NCBI
|
AMCV00000000.2
|
(67)
|
C. trifolii
|
543-2
|
NCBI
|
RYZW00000000.1
|
(67)
|
C. spinosum
|
CBS 515.97
|
NCBI
|
QAPG00000000.1
|
(67)
|
Verticillium (Outgroup)
|
V. tricorpus
|
MUCL 9792
|
NCBI
|
JPET00000000.1
|
(68)
|
V. dahliae
|
VdLs.17
|
NCBI
|
ABJE00000000.1
|
(69)
|
a Isolate information is not always available from NCBI or JGI
|
b Genomes from the JGI where project numbers are not yet available are indicated as -
|
c Genomes which needed to be annotated using AUGUSTUS
|
The overall GC content of the Colletotrichum genomes was fairly consistent, at around 50% in most of the genomes considered. However, in species where there was significant RIP (> 30%), the GC content was considerably lower. For example, members of the orbiculare species complex had some of the highest levels of RIP (45–56%) and, correspondingly, the lowest GC levels (35–38%).