Genome-wide investigation and analysis of microsatellites and compound microsatellites in Leptolyngbya species, Cyanobacteria

: Microsatellites (simple sequence repeats, SSRs) are ubiquitously distributed in almost all 10 known genomes. Here, the first investigation was designed to examine the SSRs and compound 11 microsatellites (CSSRs) in 36 genomes of Leptolyngbya . The results disclosed diversified patterns of 12 distribution, abundance, density and diversity of SSRs and CSSRs in Leptolyngbya genomes. The 13 numbers of SSRs and CSSRs were extremely uneven distributed among genomes, ranging from 11,086 14 to 27,292 and from 286 to 1,102, respectively. Mononucleotide SSRs were the most abundant category 15 in 14 genomes, while the other 22 genomes followed the pattern: di - > mono - > trinucleotide SSRs. 16 Both SSRs and CSSRs were overwhelmingly distributed in coding regions. The numbers of SSRs and 17 CSSRs were significantly correlated with genome size (P < 0.01) and but not closely correlated with 18 GC content (P > 0.05). Moreover, the motif (A/T) n and (AG) n was predominant in mononucleotide and 19 dinucleotide SSRs, and unique motifs of CSSRs were identified in 33 genomes. This study provides the 20 first insight into SSRs and CSSRs in Leptolyngbya genomes and will be useful to contribute to future 21 use as molecular markers in closely - related species.


24
Leptolyngbya that are often found to be prosperous in thermal environments are ecologically important 25 cyanobacteria in light of a crucial role in energy metabolism and matter cycling in ecosystems (Amin et     National Center for Biotechnology Information (NCBI), offering an opportunity of SSR discovery at the genomic level. To our knowledge, a genome-wide survey of SSRs and CSSRs is unavailable for 49 Leptolyngbya genomes. The present study was designed to mine and analyze SSRs and CSSRs, and to 50 further reveal the patterns of distribution, abundance, density and diversity of SSRs and CSSRs in 51 Leptolyngbya genomes. This study provides the first insight into SSRs and CSSRs in Leptolyngbya 52 genomes and may be useful for the future development of molecular markers.

55
According to the genomic resources of NCBI at the time of this study, a total of 36 genomes of 56 Leptolyngbya strains were retrieved as data for SSR and CSSR analysis. Information regarding these 57 genomes was summarized in Table 1 and Table S1. In addition, genomic annotations of the 36 58 Leptolyngbya genomes were also downloaded for corresponding analysis.

59
To illustrate the relationship among the strains studied, multi-locus sequence analysis (MLSA) was 60 performed using concatenated sequences of 15 genes from each genome. These genes were frr, pgk,    The Pearson correlation coefficient (ρ) was calculated using a custom R script to uncover the was statistically evaluated by an index, Z (Jan 2006). Z scores were computed using the following 93 equations: (1) where n, number of genomes studied (n = 36); i, genome order; ncSSR i , number of cSSR in genome;

111
Extremely uneven distribution of SSRs number was observed among genomes, ranging from 11,086 to 112 27,292. The relative abundance (RA) and relative density (RD) both showed significant dissimilarity 113 among Leptolyngbya genomes (Table 1), shifting from 2.00 to 3.64/kb and from 13.20 to 24.21 bp/kb, 114 respectively. However, great consistency of RA and RD was noticed within the subgroups (Table 1), e.g.

122
The number of cSSR in each genome (ncSSR) ranged from 580 to 2,303 (  The complexity of CSSRs in 36 genomes ranged from 2 to 8 (Table S3), except for one CSSR with an 153 extremely high complexity of 28. A vast majority of complexity was 2, accounting for 92.66% of all 154 the CSSRs (Table S3)

162
The distribution of CSSRs, the same as SSRs, were also dominantly in coding regions of all 36 163 genomes analyzed (Fig. 3). The distribution pattern of SSRs and CSSRs obtained in the present study  194 The 36 Leptolyngbya genome sizes ranged from 3.9 Mb to 9.4 Mb ( Table 1). The correlation 0.01) ( Table 2), although in several cases smaller genomes contained more SSRs or CSSRs (Table 1).

197
The GC content of all the Leptolyngbya genomes varied from 43.87% to 59.77% (Table 1).

198
Interestingly, GC content had no significant correlation with both nSSR and nCSSR (ρ = -0.22/0.08, P > 199 0.05). Nevertheless, the GC content might have influence on the GC content of SSRs, further affecting 200 the marker developments due to difficult amplification of GC-rich SSRs by PCR. In this study, SSRs of

201
Leptolyngbya genomes appeared to be AT-rich (Fig. S1), which might be valuable in the development 202 of SSRs markers.

203
The complexity analysis of CSSRs in the Leptolyngbya genomes showed that these CSSRs 204 primarily comprised two SSRs (complexity = 2) ( Table S3) (Table S4). These unique motifs were possibly shaped by two reasons. First, the diverse SSR 210 types in each genome generated various motifs (SSR-couple). Second, mutations within SSRs are 211 reported to be frequent (Xu and Peng 2000). The surveyed Leptolyngbya genomes were from diverse 212 niches (Table S1) and easily possessed diversified mutations during evolutionary processes. This 213 hypothesis was verified by the unique motifs obtained in this study that were differentiated from each 214 other by just one or two single mutations.

215
The SSRs and CSSRs identified in this study were predominantly distributed in coding regions of 216 each genome (Fig. 2b, Fig. 3). This result indicated a potential functional role of SSRs and CSSRs in

237
Conclusively, a thorough survey was completed to disclose the patterns of distribution, abundance,

249
The authors declare that they have no conflict of interest.   The SSR distribution patterns in 36 Leptolyngbya genomes. a distribution of SSR repeat type. b SSR distribution in coding and non-coding regions Figure 3 The distribution of CSSR in coding and non-coding regions of 36