Selection of single guided-RNAs (sgRNAs) to target genes related to anther response towards high temperature stress
A total of 112 genes were selected from our reported transcriptome data [39] that suggested to play a vital role in male reproductive organs respond to high temperature stress in cotton. The selected genes and their predicted function according to CottonFGD website (https://cottonfgd.org/) are in Supplemental Table S1. To better understand the function of these genes, save time and minimize the labor work, we generated a collection of targeted mutants using CRISPR/Cas9 mediated pooled sgRNAs assembly. For pooled sgRNAs assembly construction, 116 highly specific single guide RNAs (sgRNAs) were designed downstream the start codon of the candidate genes. These sgRNAs are 23 bp in length ending-up with NGG protospacer adjacent motif sequence. Every selected sgRNA was synthesized as a DNA oligonucleotide fused to 16 bp sequence homology to the linear ends of pRGEB32- GhU6.9 vector [2]. Out of the 116 sgRNAs, 13 sgRNAs were designed to specifically match single site sequences; 83 sgRNAs were perfectly matched with only 2 specific sites; 14 sgRNAs targeted precisely 3 sites and finally, 6 sgRNAs were considered to target more than 3 sites along the cotton genome. In this study, almost all genes were disrupted by at least more than one sgRNA targeting different coding regions (sgRNA sequences are attached in Supplemental Table S2).
Optimal construction of pooled sgRNAs assemblies led to high coverage of sgRNAs
Vector construction for individual gene is costly and time-consuming. To improve the vector construction efficiency, the 116 designed sgRNAs were divided into 4 separate assembled groups (29 sgRNAs for each group); each group of the oligo sgRNAs were separately pooled in equal amount and amplified by PCR (Figure 1A); the PCR product and the linearized pRGEB32- GhU6.9 vector (Figure 1B) were next mixed and fused to form the recombined vector (Figure 1C and D). Resulted from In-fusion, the fused constructs were introduced to bacterial-host-competent cells (Top10) and at least 200 of the positive monocolonies were harvested (Figure 1E and F). After that, plasmids of the bacterial cells harboring the constructed vector were introduced into Agrobacterium (GV3101) (Figure 1G). Overall, each assembled group contained not less than 500 Agrobacterium monocolonies. Finally, all grown colonies were collected and used for high throughput sequencing to ensure all targeted sgRNAs were included in the pooled assembly (Figure 1H).
Large scale sequences in Agrobacterium to check the coverage of sgRNAs during pooled construction
Coverage and uniformity in vector construction are the main factors for successful genetic transformation of plants during Agrobacterium-mediated transformation. Therefore, all grown colonies presented in the cultured Petri plates were collected for plasmid extraction. Because almost all constructed plasmid sequences were the same except those of the spacer sequences (23 bp) that showed the differences between each vector, primers from flanking sequences were used to amplify all these sequences. A 150 bp upstream and downstream the sgRNAs inserted in pRGEB32- GhU6.9 vector of the four assembled groups were amplified and collected together in one tube for next-generation sequencing (NGS). Sequencing results showed that all the constructed 116 sgRNAs were existed; the coverage of sgRNAs was successfully 100%. The sequencing reads were obtained between 2000 to 7000 for each sgRNA (Figure 1I). This variation of the sgRNAs reads is due to many factors such as PCR conditions, priming and buffers quality of the purification Kit or sequencer. However, most of the reads ranged between 4800 to 5300. These result indicated that pooling 29 sgRNAs in one assembly is suitable for a successful, faster, easier and lower cost than single vector construction.
Agrobacterium transformation in cotton using constructed pooled sgRNAs assemblies
Independently, the genetic transformation of cotton has been done for each pooled sgRNAs assembly using 150 seedlings of Jin668 cultivar as explants. Transformants went through a series of subculture processes starting with the co-culture at 20 ˚C for 48 h in the dark (Figure 2A) then shifted to medium containing 2,4-D for callus induction (Figure 2B) . At least 2000 hypocotyl segments (length≤0.8 cm) were used for callus induction (about 80 Petri dishes). Out of 2000 hypocotyl segments (Figure 2C), approximately 600 single positive calluses (somatic embryogenesis) were shifted to differentiation medium for cell differentiation (Figure 2D). After sub-culturing of differentiation, about 200-300 normal plantlets were shifted to the rooting medium (Figure 2E), then gradually acclimated to the normal conditions (Figure 2F). In sum, we harvested more than 800 differentiated plantlets from all pooled sgRNAs assembly groups (Figure 2G). Out of them, 718 T0 plants were successfully shifted to the greenhouse (Figure 2H and I) for phenotyping and genotyping analyses.
Barcode-based high-throughput sequencing is a powerful strategy to identify inserted sgRNAs in CRISPR-Cas9 mutants
PCR-based barcodes library is a viable method that offers high-throughput sequencing of hundreds of gene loci in one pooled sample at once. Barcoding strategy has been used for tracing DNA or RNA that is originated from separate cells or individuals [40,41]. Our optimized CRISPR-Cas9 system has generated a total of 718 transgenic plants harboring undefined sgRNAs/insertions. To detect and identify the inserted sgRNAs for such number of generated mutants is very challenging. Therefore using barcode sequencing rather than Sanger sequencing is of great significance in which it saves time, labor and money. As a result, we designed different barcodes at 5’ end of the primers, and a total of 44 primers can simultaneously detect the sgRNA sites of 384 mutants. Barcodes and primer sequences are shown in Supplemental Table S3. The genomic DNA of all generated plants was used for PCR-based barcodes library construction in one round of amplification (Figure 3A and B). Each mutant was labeled individually with unique barcodes via PCR and positive PCR results indicated the presence of the T-DNA insertion harboring the sgRNA (Figure 3C). PCR results showed all generated mutants harbored the sgRNA insertion. After PCR confirmation, DNA library was built by pooling the positive PCR mixture together as one sample in equal amount; one sample included 384 individual DNA fragments (Figure 3D). Subsequently, the pooled DNA comprised the assembled sgRNAs were adjusted to Illumina HiSeq 3000 system for paired-end 150 bp reads (Figure 3E). High throughput sequencing results identified which sgRNA exists in each mutant. The results showed that all the sequenced individuals harbored T-DNA with the targeted corresponding sgRNAs and no mock insertions were observed. Out of 116 designed sgRNAs, 83 sgRNAs were existent in the mutated population, and all the identified sgRNAs covered all the targeted genes. Among 718 T0 plants, 613 harbored only one sgRNA; 65 plants contain two different sgRNAs; 22 plants contain three different sgRNAs and 18 plants contain mutated sgRNA that might be caused during PCR amplification or due to impurity of the synthesized sgRNAs (Figure 3F). The huge number of plants (85% of total plants) with only one sgRNA indicated that our pooled sgRNAs assembly is efficient enough to induce targeted mutations with high possibility to produce more independent lines in a short time at low cost. As a result, we can infer that CRISPR-Cas9 mediated pooled sgRNAs assembly is a powerful strategy which may pave the way for functional genomic researchers to study multiple genes and their homologues not only in cotton but also for plant species those have complex genomes.
Tracking mutations by Hi-Tom based NGS sequencing and gene editing efficiency analysis
Although many tools have been developed to track mutation type induced by genome editing tools [31–34, 44]. Hi-TOM is a simplified and cheap strategy to detect hundreds of mutants induced by CRISPR/Cas9 technology [33]. After the identification of sgRNA located in each transgenic plant, we used high-throughput (Hi-Tom) sequencing method for mutations profiling for each generated plants. In this study, 613 genomic DNA samples were used individually to amplify the targeted region of each candidate gene using site-specific primers. After that, the targeted regions went through second round of amplification to label each plant by unique barcodes. The final size of the DNA fragment resulted from the two rounds of PCR was about ~200 bp. Every 96 PCR individuals were pooled together as one sample for sequencing and analysis. Sequencing results showed that all candidate genes were successfully edited; 361 plants exhibited editing at the directed sites driven by our sgRNAs that formed (58.89%) over all plants (Fig. 4A). The editing achieved various targeted mutagenesis of our candidate genes in which deletions accounted for the highest proportion. Heterozygous mutations also take a part in our population, there are 198 plants displayed editing whether in only one allele or in double alleles at different sites. Only 54 plants showed no editing in their targeted genes. The results indicated that our CRISPR-Cas9 mediated pooled sgRNAs assembly is a highly effective strategy to preciously edit wide number of genes in one round of transformation.
Mutation characteristics of the generated plants
Our pooled sgRNAs assembly included different gene families that were proposed from transcriptome data in the regulatory of the male reproductive organ development in cotton. In the present study these genes were successfully knocked out by CRISPR-cas9 and a huge number of the mutants were generated. Therefore, once the mutant is obtained, it is necessary to screen all the generated lines harboring different genes for any changes in their growth or productivity compared to non-transgenic plants.
Aberrant morphological phenotypes
All transgenic plants harbored single targeted sgRNA have been used and compared with Jin668 as a control in the greenhouse under normal growth conditions. Phenotyping was conducted by the screening of 613 independent lines containing 112 genes that were successfully edited. Through morphological comparison between the wild-type and library mutants, several floral phenotypes were observed among the progeny of the transgenic plants (Figure 5A), such as long stigma (Figure 5B-H), short stigma and non-dehiscent anthers (Figure 5I-M), shrivel anthers (Figure 5E-G), fewer filaments (Figure 5O-R), and small flower and small petals (Figure 6) These phenotypes might be a key to understand the molecular mechanisms of anther and pollen development in cotton, which still few studies have focused on this research area. These results also indicated that this system can generate wide scale of genotypic and phenotypic mutagenesis.
Seeds production ability in the generated plants
One of the main targets of culturing T0 plants is to collect seeds for the next generation to study T1 progeny. For that, the generated plants were shifted to the greenhouse and seeds production has been screened/noted in all studied lines for two times (after 3 months and 6 months of shifting time). Interestingly, more than 53% of the population of mutated plants couldn’t produce seeds (Figure 4B). Notably, plant failed to produce seeds obtained homozygous mutants. The failure of seeds production might be also because of the somaclonal variations. These results demonstrated that our library may provide a good resource for further deep functional genomic studies especially in the field of reproductive traits in cotton. Also it opens new insights for functional genomic research area to study multiple genes in a short time.