Chloroplast genome analysis and genetic transformation of Parachlorella kessleri-I a potential marine alga for biofuel production

Background: Parachlorella kessleri-I produces higher biomass and lipid content suitable for commercial production of biofuels. Sequencing complete chloroplast genome will be instrumental in the constructing species specific chloroplast transformation vectors and generating chloroplast transgenic microalga with the desired traits and greater productivity, essential for commercial sustainability of microalgae based biofuel production. Results: Complete chloroplast genome sequence (cpDNA) of P. kessleri-I was annotated. The 109,642 bp chloroplast genome exhibited a quadripartite structure with two reverse repeat regions (IRA and IRB), a long single copy (LSC) and a small single copy (SSC) region. The genome encodes 117 unique genes, with 70 predicted protein coding genes, 35 tRNAs, 4 rRNAs. The cpDNA provided essential information like codons, UTRs and flank sequences for homologous recombination to make a species specific chloroplast transformation vector that facilitated chloroplast transformation of P. kessleri-I. To optimize chloroplast transformation, two antibiotic resistance makers aminoglycoside adenine transferase (aadA) conferring resistance to spectinomycin and Sh-ble gene from bacteria that conferred resistance to zeocin were tested. Using a aadA gene, transgenic colonies were retrieved on TAP medium containing 400 mg/l spectinomycin. However, no transgenic colonies were recovered in the zeocin supplemented medium. The spectinomycin resistant algal cell lines were analyzed by PCR. Southern blotting confirmed the stable transgenes integration into the chloroplast genome of P. kessleri-I via homologous recombination. Conclusion: The complete chloroplast genome analysis may provide valuable resources for population and evolutionary studies of Parachlorella species and identifying the related species. The chloroplast genome of P. kessleri-I was assembled as a quadripartite structure of 109,642 bp with defined IR regions. Its complete sequencing has provided essential information like codons, UTRs and flanking sequences to generate the species specific chloroplast transformation vector and obtaining the successful strain)is biofuel molecules. We studied the whole chloroplast genome of P. kessleri- I strain, isolated from the Indian Ocean. It was sequenced through Illumina Platform and assembled using NOVOPlasty v3.0 and annotated using GeSeq MPI chlorobox. Manual annotation was done in order to confirm the repeat regions using NCBI Blast. The complete P. kessleri- I chloroplast genome is 109,642 bp in length, with 117 unique genes. The trnI-GAU, trnA-UGC, 16s and 23s ribosomal DNA unique sites helped in obtaining precise integration and expression of transgenes. The optimized chloroplast transformation via homologous recombination at the specific should help in improving the P. kessleri- I strain for enhancing biofuel properties. This study should open a new gateway in chloroplast engineering of Parachlorella spp and improve the biofuel values via carbon concentration mechanism, overexpression of RuBisCo enzyme for higher photosynthesis, expression of short chain fatty acid thioesterases to make biofuel suitable for drop in jet-fuel or producing value added products.

for improving photosynthetic process, introducing carbon concentration mechanism (CCM) for fixing higher CO2 that may be instrumental in producing economically viable biofuel molecules.

Background
Parachlorella is a unicellular green algae belonging to the class Trebouxiophyceae in the order Chlorellales. The classes Trebouxiophyceae, Prasinophyceae, Ulvophyceae and Chlorophyceae belong to the phylum Chlorophyta (1,2) among which Prasinophyceae shows the most basal divergence (3) while the Trebouxiophyceae emerged before the Ulvophyceae and Chlorophyceae as suggested from data of chloroplast and mitochondrial genome (4)(5)(6) Chloroplasts were captured early during the evolution of a eukaryotic cell as they are considered to have originated from cyanobacteria through endosymbiosis. During the course of evolution, extensive rearrangements occurred within the chloroplast genomes or plastome. However, as compared to that of free-living cyanobacteria the size of the chloroplast genome is subsequently reduced, but still many DNA sequences of chloroplast resemble to the cyanobacterial genome (7). It is generally believed that land plants have evolved from green algae (8) and the chloroplast genome is a highly conserved among plant species (9). Most land plant plastomes have two identical copies of inverted repeat region (IRA and IRB) separating a large single copy (LSC) region and a small single copy (SSC) region (10,11). Therefore, insightful information on repeated sequences, intergenic regions, and pseudogenes in the chloroplast DNA may be very helpful in deciphering the process of chloroplast genome evolution.
The genetic transformation of industrial microalgae may be useful for an economic viability to overexpress the desired important traits for the commercial feedstock development and sustainable biofuels production (12)(13)(14)(15)(16). Chloroplast transformation is routine in Nicotiana tabacum plant and Chlamydomonas reinhardtii alga and first stable chloroplast transformation was achieved in the same species, but it has been challenging in other plant species and algae. A nuclear genome based genetic transformation method was reported for Parachlorella sp. (17). However, the plastome genetic engineering tools are not reported yet for this spp. The chloroplast organelle in algae with several copies of genome, occupies about two thirds of the cellular space in an organism. The nature of the chloroplast genome is prokaryotic that enables site-specific recombination between homologous DNA sequences providing an unique opportunity for overexpression of transgenes and genetic modifications (18). Chloroplast transformation exhibits higher level of transgene expression, thus higher level of biofuel molecules and value added products can be produced in algae due to the presence of multiple copies of chloroplast transgenes per cell. Moreover, transgenes integration in chloroplast genome is precise and unaffected by phenomenon such as pre or post-transcriptional silencing despite transcripts accumulating at 169-fold higher levels than in nuclear transformation (20). There is no deletions and rearrangements of transgene DNA, at the site of insertion. This is advantageous over nuclear transformation that usually leads to random integration of transgenes (19). Moreover, in chloroplast genome transformation, transgene integrated into one IR region gets duplicated via RecA enzyme into the other IR as well (10). Chloroplast genes in green microalga (C. reinhardtii) are known to transfer via maternal inheritance (21). The uniparental inheritance of chloroplast traits in green algae (21,22) can be the key to generate genetically modified species with lower environmental risks (23)(24)(25). This would be beneficial in biological containment of genetically modified strains when cultured outdoors in large volumes e.g. raceway ponds and photobioreactors.
In brief, chloroplast transformation in algae may be useful to overexpress thioesterases to produce fatty acids of desired chain lengths, overexpression of carbon concentration mechanism (CCM) to fix higher CO2 and reduce photorespiration, improvement of photosynthetic efficiency in alga by directly expressing Rubisco transgenes in the chloroplast genome.
Researchers have been studying algal fuels extensively since the 1970s, but no commercially-viable strain has yet been isolated (26). In the past, some programmes had to be shut down due to the higher cost of algal biofuels as compared to fossil fuels. The use of recombinant DNA techniques offer the greatest range of options to improve upon the performance of wild strains. Therefore, the transgenic approach has been investigated as a potential method of increasing the productivity of algae and to compete with relatively cheaper fossil fuels. P. kessleri accumulates higher starch and lipid content in laboratory conditions as well as on a semi-industrial scale in outdoor photobioreactors (27) and is considered an ideal microalgal species for biofuel production. It occurs in both freshwater and marine environments.
To obtain optimal expression of transgenes in the chloroplast, identification of spacer regions and endogenous regulatory sequences for integration of transgenes is required. The homologous flanking sequences are attached on either side of the transgene cassette while designing the chloroplast transformation vectors to facilitate double recombination, thereby eliminating the concerns of position effect. A novel marine strain P. Kessleri-Iwas isolated from the Indian ocean, which contains high lipids on a dry weight basis (14). Thus, to gain more insights, we have undertaken complete sequencing and annotation of the chloroplast genome of the oleaginous marine alga P. kessleri-I.
Phylogenetic analysis indicated that, P. kessleri-Ishares common ancestry with the freshwater P.
kessleri. The chloroplast genome sequence information will be used for designing appropriate species specific vector using codon optimization, internal UTRs and other elements essential for higher expression of transgenes in P. kessleri-I.

Chloroplast genomic organization and features
The whole genomic DNA of P. kessleri-Iwas sequenced using Illumina techniques. A total of 17498340 of paired end reads were generated with a read length of 150 bp. NOVOPlasty v3.0 (28) was used to assemble the circular chloroplast genome of P. kessleri-I. The chloroplast DNA sequence of P. kessleri-I was assembled as a quadripartite structure of 109,642 bp ( Fig. 1) with LSC (41,443 bp), SSC (35,669 bp) and a pair of IRs (10,216 bp). The organelle genome percent obtained was 6.09% and average organelle coverage was 1539 using NOVOPlasty v3.0 (28). The GC and AC content of whole plastome were 29.5% and 70.5% respectively.
In the chloroplast genome of P. kessleri-I, total 116 genes were predicted including 23 photosynthetic genes (6 for PSI and 17 for PS II), 5 cytochrome complexes, 6 ATP synthases, 4 atp binding and one large subunit of Rubisco. There were 75 protein synthesis genes including 32 tRNA genes, 5 rRNA genes, 11 small ribosomal unit genes, 8 large ribosomal unit genes, 4 RNA polymerase subunit genes.

Comparative Analysis
Genomic alignment was carried out using MAUVE software (30) in order to identify evolutionary changes in the DNA by alignment of homologous regions of sequence. Both the marine P. kessleri-I and freshwater P. kessleri plastome were laid out horizontally. The colored blocks represented homologous segments that are connected across genomes. The blocks that lie above the centre line are aligned in the forward orientation relative to the first genome sequence i.e P. kessleri-I. Blocks below the centre line represented the region that aligns in reverse complement orientation.
Comparative analyses showed that both the chloroplast genomes of marine and freshwater algae have a significant homology as shown in Fig. 2A Mauve colored region of the similarity plot represented the conserved part in both the genomes. Fig. 2B Average level of conservation in a particular region was shown by the height of similarity profile.

Chloroplast transformation and selection of transgenic cultures
The growth of P. kessleri-I was optimized using a semisolid TAP medium and varying concentrations of zeocin and spectinomycin were used to determine the antibiotic sensitivity for P. kessleri-I cells growth. The 20 mgL -1 zeocin and 400 mgL -1 spectinomycin were observed to be ceased to growth of wildtype algal cells and used as an optimum antibiotics concentrations for selecting the transgenic alga after introducing the PkCpV cassette via bombardment. The chloroplast transformation vector containing PkCpV cassette of Sh-Ble gene confers resistance to bleomycin, phleomycin, and Zeocin (Sh ble protein: QBQ65853) was placed under 16S-rrn PEP promoter (AC# KY426960.1) along with 5'UTR-g10 translator of E.coli (AC# AF176637). The aadA gene (AC# AY442171) confers resistance to spectinomycin was placed under the light regulated promoter psbA along with 5'UTR-psbA translator and the aadA gene product was terminated with 3'UTR-psbA (similar to AC# NC_001879). The PkCpV cassette was designed in-silico as shown in supplementary figure S1 and DNA vector was synthesized (BioBasic Inc. Canada).
Following the reported protocol (31), P. kessleri-I cells lawn was prepared and bombarded with the chloroplast transformation vector PkCpV. There were 60 plates were bombarded using 1350 psi rupture disk at the distance of 6 cm while 50 plates were bombarded at the distance of 9 cm. The bombarded algal plates were incubated in the dark for two nights. Thereafter, approximately equal number of cells (1 x 10 6 ) were plated on the selection medium containing zeocin and spectinomycin on the third day. A single green colony was observed bombarded at 6 cm distance while two green colonies when bombarded at 9 cm after 3-4 weeks on the TAP medium containing 400 mg/l spectinomycin. However, no transgenic colonies appeared on the TAP medium containing zeocin (20 mg/l), even after six weeks after the bombardment. The spectinomycin resistant colonies were further subcultured after every two weeks on antibiotic containing TAP medium for 4 months. Transgenes integration was analysed by both PCR and Southern blot.

Analysis of transgene integration in chloroplast genome
The three spectinomycin resistant transgenic colonies (Fig. 3B),, namely T1, T2 and T3 were tested for transgene integration in the chloroplast genome. The transformed cell lines of P. kessleri-I and wildtype (WT) cells were grown on liquid TAP medium to bulk up the biomass for extracting the total genomic DNA. The presence of aadA gene integrated into the chloroplast genome was determined by PCR as described in Material and Methods. The expected PCR amplicon of size ~850 bp of aadA gene was observed, which confirmed the transgenes integration into P. kessleri-I cells (Fig. 3C).. There was no PCR-amplification was observed in the WT cells.
To ascertain if the targeted DNA casette has indeed been integrated at the specified position in the chloroplast genome between 16S-trnI and trnA-23S locus; genomic DNA of transgenic cell lines was extracted and digested by NcoI and HindIII restriction enzymes. NcoI site was located within the vector cassette while HindIII digested the DNA outside the Right Flank (trnA-23S) region as shown in Transgenic algal cell line T, T2 and T3 were maintained as a frozen cultures in minus -70°C as reported (14) were not exposed or disposed into the environment at the end of the experiment. bp with 70% AT rich region (29). P. kessleri-Icontains 117 genes (Table 1) whereas P. kessleri freshwater strain contains 112 genes. The additional genes present in marine strain are pbf1 (photosynthesis biogenesis factor), mbpX (similar to cysA for ATP binding), Psb30 (PSII reaction centre subunit). The size of P. kessleri-I chloroplast genome was smaller than its freshwater counterpart, 109,642 bp and 123,994 bp respectively. Genome progressive alignment was carried out in order to ascertain if P. kessleri-I has undergone any genome rearrangement, gene gain, loss, and duplication.
MAUVE software was used to align the two genomes and it was noticed that maximum region was conserved in both the genome sequencesFig. 2B).. This would indicate a common ancestry for both the marine and freshwater species that may have acclimatized according to their diverse habitat.
Chloroplasts are ideal hosts for expression of transgenes (10). Major advantage of chloroplast transformation is the ability to accumulate large amounts of foreign protein. The reported yields of recombinant proteins in the algal chloroplasts are generally in the range of 0.1 to 5 % total soluble protein (TSP) of the cell (33). Also, protein folding and disulphide bond formation occurs readily in the chloroplasts making it ideal platform for production of proteins with multiple domains or subunits.
Moreover, the chloroplast provides a sub-cellular compartment for accumulation of the recombinant proteins. Another major advantage that chloroplast engineering offers is the precise insertion of transgenes into specific sites in the chloroplast genome via homologous recombination and is not random, unlike the nuclear transgene integration. Homologous recombination is very common in the plastid genomes and is mediated by RecA protein. RecA also helps in maintaining chloroplast DNA integrity by participating in homologous recombination DNA repair (34). Complete homology of the plastid DNA flanking sequences ensures highly efficient chloroplast transformation. The PkCpV cassette was flanked by endogenous 16S rDNA-trnI sequence on the left and trnA-23s rDNA sequence on the right. Integration of transgenes in the intergenic region between trnI-GAU and trnA-UGC genes has resulted in high levels of expression in microalgae as well as in plants (35), hence this region was chosen as site specific integration of transgenes attributing to its high transcriptional activity. Using the NcoI site and HindIII restriction enzyme digestion of transgenic genome, we have ensured that transgenes Sh-Ble and aadA have precisely integrated between the trnI and trnA region. The HindIII restriction enzyme cuts outside the flanking region of vector, which confirmed that transgenes are integrated at the specified site. Also spectinomycin was observed to give better selection after transformation than zeocin. This may be because of heightened sensitivity of P. kessleri-I to the antibiotic zeocin in absence of any salts as increasing NaCl beyond 0.2M decreases the antibiotic sensitivity in microalgae (36). Similar observation is reported by Muñoz et al, 2019 (37) where zeocin was tested for sensitivity against Acutodesmus obliquus and Neochloris oleoabundans but was not used for selection of transformants.
The comparative growth analysis of WT and transgenic cell cultures have produced around same amount of biomass and lipid that indicate that there was no detrimental impact of transgenes, which have been integrated into chloroplast genome of P. kessleri-I. Similar growth and lipid production was observed in transgenic and WT cells. Chloroplast transformation of other microalgae has resulted in a variety of bio products such as vaccines, monoclonal antibodies, biocatalysts etc. Chlamydomonas reinhardtii chloroplast has been extensively exploited for production of subunit vaccines, monoclonal antibodies, immunotoxins and cancer cell therapeutics (33). Chloroplast of another marine microalga Dunelliella tertiolecta was transformed for production of enzymes such as xylanase, α-galactosidase, phytase, phosphatase, and β-mannanase (38).

Conclusion
Marine P. kessleri-IICGEB strain)is an oil bearing robust strain and industrially important for producing biofuel molecules. We studied the whole chloroplast genome of P. kessleri-I strain, isolated from the Indian Ocean. It was sequenced through Illumina Platform and assembled using NOVOPlasty v3.0 and annotated using GeSeq MPI chlorobox. Manual annotation was done in order to confirm the repeat regions using NCBI Blast. The complete P. kessleri-I chloroplast genome is 109,642 bp in length, with

DNA Extraction: Qualitative and Quantitative analysis of gDNA
P. kessleri-I strain was scaled up using liquid f/2 medium. The culture was centrifuged at 3600xg and the pellet was resuspended in 3ml ice cold breaking buffer containing 5mM dithiothreitol (DTT) and 5mM sodium ascorbate. Algal cells were sonicated on ice using a cell disruptor (Sonics, 3 mm Tapered microtip probe) at frequency of 1 KHz, 5 sec/ 5 sec off pulse at 30% amplitude for 2 min. The algal cells were collected by centrifugation and pellet was resuspended in 1 ml buffer containing 10 mM HEPES, 600 Mm sorbitol, 50 mM MgCl 2 and 0.1% bovine serum albumin (BSA), pH 7.5-7.8 at 4°C. The autofluorescence of intact chloroplasts was monitored using Fluorescence Brightner 28 (Sigma Aldrich). The fluorescence from stained cellular components was observed at 650 nm and autofluorescence from intact chloroplast was seen at 420 nm. The presence or absence of fluorescence was observed using a NIKON fluorescence microscope. After, confirmation of chloroplast release, the enriched chloroplast was layered on 60-70% sucrose gradient. The gradients were allowed to equilibrate overnight at 4°C. Gradient layered with cell lysate was centrifuged at 197120xg for 60 min at 10°C using the Ultracentrifuge (Beckman Coulter Optima XE-100). The chloroplast fraction was observed as a single dark green band almost at the middle of the tube (41). The plastomic DNA was extracted using CTAB (Cetyl Trimethyl Ammonium Bromide) buffer method (42).
The quality of genomic DNA was checked on 1% agarose gel for a single intact band. 1µl of sample was used for determining the DNA concentration by NanoDrop™ 2000 Spectrophotometer (Thermo Scientific™). The purified cpDNA was supplied to Xcelris Genomics (Ahmedabad, India) for sequencing the chloroplast genome of P. kessleri-I.

Genome sequencing and data pre-processing
The chloroplast genome sequencing of P. kessleri-I was carried out with the help of Xcelris Genomics (Ahmedabad, India). The paired-end sequencing libraries were prepared using Truseq Nano DNA Library prep kit and sequenced on Illumina NexSeq (2 X 150 PE). A high-fidelity amplification step was performed using HiFi PCR Master Mix to ensure maximum yield.

Chloroplast genome Assembly and Annotation
The chloroplast genome of P. kessleri-I was assembled de novo employing NOVOPlasty v3.0, using untrimmed reads and Rubisco-bis-phosphate oxygenase (RuBP) as the seed sequence (28). The other specified parameters included genome range of 100000-220000, K-mer value of 39, reference sequence of reported fresh water P. kessleri chloroplast genome (NC_012978.1) and the paired-end reads option. A web based annotation tool, GeSeq (part of the CHLOROBOX toolbox) designed for annotation of organellar sequences (43) was used for annotating the chloroplast genome. Annotation was confirmed manually as well as by using NCBI BLAST (39) and DOGMA (44). Visualization of the cp genome was done using OGDRAW (45). All transfer RNA sequences (tRNA) encoded in the cp genome were verified using ARAGORN v1.2.38 (46) with default parameters.
Progressive MAUVE genome alignment algorithm was used with default parameters in order to compare the two genomes so as to observe any gene loss, duplication, genome rearrangement, and horizontal transfer caused due to recombination.

Study of antibiotics sensitivity
Prior to chloroplast transformation, algal cells were adapted to grow on TAP medium (47) without addition of NaCl (salt) so that sensitivity to antibiotics can be accurately determined. In the presence of salt, the algal cells may become tolerant to antibiotics (36). In order to identify an optimal concentration of the antibiotics zeocin (InvitroGen) and spectinomycin (Sigma Aldrich), different concentrations of zeocin (10,20,40,60,80 and 100 mg/L) and spectinomycin (100, 200, 300, 400, 500 and 600 mg/L) were tested using TAP medium solidified with 1.2% agar. Approximately 1.5 × The transgenes integration was further confirmed by Southern blot analysis. Genomic DNA of transformed and untransformed cells was extracted using the DNeasy plant mini kit (Qiagen). About 6 μg genomic DNA was digested with NcoI and HindIII (restriction enzyme located outside the flanking region of 23S) as shown in Fig. 3A