Comparative transcriptome analysis reveals genetic mechanisms of phenotypic differences between Landsberg erecta-0 and Columbia-0 ecotypes in Arabidopsis

As common ecotypes of Arabidopsis thaliana, obvious phenotypic differences have been observed between Landsberg erecta-0 (Ler-0) and Columbia-0 (Col-0). However, it is unlikely to explain a phenotypic trait from each single-gene mutant. In this study, a genotype to phenotype prediction using comparative transcriptome analysis was established to study multigene regulation in biological processes. Analyses of RNA sequencing data of Ler-0 and Col-0 revealed that the differences on gene expression in different organs were larger than those in different ecotypes. Among the 671 differential expressed genes between Ler-0 and Col-0, 273 genes were up-regulated and 398 genes were down-regulated in Ler-0. Gene Ontology (GO) analyses were performed by using these genes, and 43, 31 and 58 genes were enriched in response to cold, response to cold and response to hormone stimulus respectively. Due to these datas, we then tested the sensitivity of Col-0 and Ler-0 to cold and abscisic acid (ABA) stress, and Col-0 ecotype exhibited more tolerance to cold stress and ABA. We also quantied the cell and leaf length of Col-0 and Ler-0, and these different growth phenotypes may be partly caused by the genes which enriched in response to hormone stimulus. The differential expressed genes identied through the mRNA transcriptome provide a revealing insight into guiding phenotyping of organisms and providing genotype to phenotype relations for better understanding of plant growth, development and response to environments. This approach could also be applied to genetic mutants for discovery of novel phenotypes thus better understanding of the mutant gene functions.


Introduction
Columbia-0 (Col-0) and Landsberg erecta-0 (Ler-0) are common ecotypes of Arabidopsis thaliana and widely used as models for botanical studies (Karuppanapandian et al. 2012;Yin et al. 2012;Tanaka et al. 2020). Ler-0 exhibits top clustered owers, round leaves with short petioles which differ from Col-0 (Rédei 1992). However, these phenotypical traits are unlikely to be explained by differences at one single gene level (Simon et al. 2013). In addition to these phenotypic differences, several other differences have also been reported. Yamamoto-Toyoda et al. (1999) has reported that ABA controls the chlorophyll content of the Col-0 seedlings, but it has no effect on Ler-0 seedlings. IAA activates the plasma membrane H + -ATPases in the segments of Ler but not those of Col, while fusicoccin (FC) activates them in both ecotypes (Soga et al. 2000). Ler-0 produces more lateral roots from a higher percentage of primordia and has an overall larger root system than that of Col-0 under osmotic stress conditions (Gerald et al. 2006).
Ler-0 deals with drought by accelerating owering thus escaping drought conditions, whereas Col-0 copes with drought by tolerant mechanisms (Meyre et al. 2001). Previous studies on different ecotypes often started from the observation of phenotypical differences, and then detected the expression of genes associated with the phenotypes. However, phenotyping of plants, in particular genetic mutants, for new phenotypes that are not easily observable is not an easy task, and important phenotypes could be missed due to the lack of guidance for phenotyping (Guan et al. 2014).
The whole genomes of both Col-0 and Ler-0 ecotypes have been sequenced, and their genomic sequences showed that they were containing not only rich SNPs but also many large indels (Gan et al. 2011). These differences account for the genetic basis of their phenotypic differences. However, the phenotypes resulted from the integration of both genetic and environmental factors, and the mechanisms on how genetic differences are translated into the phenotypes is very complicated. Although a lot of works have been done to investigate the phenotypes between these two ecotypes, but the molecular basis of the phenotypical differences has been poorly studied.
RNA sequencing (RNA-Seq) is a modern technology for transcriptome studies by using the highthroughput sequencing (HTS) platforms, and it can quantify overall expression levels and the degree of alternative splicing for each gene simultaneously (Ozsolak and Milos 2011). With rapid declining of the cost for RNA sequencing, the RNA-Seq datas of Arabidopsis are accumulating explosively. These publicly available RNA-Seq datas provide the deep-mining clues for scienti c community to study the relationship between different phenotypes and global transcription levels. Some phenotypes that unable to discover by conventional methods could be explored by RNA-seq analysis.
In this paper, we used the transcriptomic data of Col-0 and Ler-0 from the public NCBI database and reanalyzed the data to dig out the genes showing different expression levels between Col-0 and Ler-0.
Based on the data analysis, we investigated the phenotypic differences between Col-0 and Ler-0 by Gene Ontology (GO) enrichment analysis of these signi cantly differential expressed genes. Our results would demonstrate that this so-called phenotype-mining based on GO analysis of differential expressed genes is a reliable method for new phenotypes discovery between different ecotypes as well as genetic mutants of Arabidopsis.

Materials And Methods
Materials RNA-Seq data of Ler-0 and Col-0 from seedlings, roots and oral buds tissues were searched and downloaded from NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra/), with accession numbers SRX084368, SRX084369, SRX084370, SRX389757, SRX389758 and SRX389759 respectively, which are listed in Table 1. The Arabidopsis thaliana genome, gene models, gene functional descriptions and Gene Ontology assignments were downloaded from The Arabidopsis Information Resource (TAIR, https://www.arabidopsis.org/).
The data of Arabidopsis thaliana genome and relevant gene descriptions were extracted and used as template for mapping and assembling the assessed and ltered RNA-Seq reads by Bowtie (version 2.1.0) and TopHat software (version 2.0.8) with default parameters (Langmead et al. 2012;Kim et al. 2013).
Cu inks software was run with default parameters to generate fragment per kilobase per million reads (FPKM) values (Trapnell et al. 2012;Begara-Morales et al. 2014). The output les were further annotated (in-house Perl scripts) by adding gene functional descriptions and GO classi cations, and merged into a master le (Supplemental Data Set 1).

Gene expression pro ling analysis and GO/KEGG enrichment analysis
After assembling all the genes, the copy number of each gene was counted from library. And the expression pro les were displayed by a heatmap analysis using heatmap.2 in the "gplots" package of R program. Then, online AgriGO (GO, http://systemsbiology.cau.edu.cn/agriGOv2/index.php) was used for gene-GO term enrichment and functional annotation of the differential expressed genes (P<0.05). The Arabidopsis thaliana TAIR10 was used as selected species (Du et al. 2010).
Gene ontology offers both Fisher's exact test and a false discovery rate (FDR) with a threshold of 0.05 to reduce false positive prediction. The differential expressed genes' sequences (P<0.05) were submitted to the KEGG (Kyoto Encyclopedia of Genes and Genomes) Automatic Annotation Server (KAAS) and the single-directional best hit information method was selected to identify the BRITE functional hierarchies and to enrich the pathway annotation Moriya et al. 2007). In the end, differential expressed genes with possibly same or similar functions were clustered. The clustered differential expressed genes between Ler-0 and Col-0 were used to predict and analyze the phenotypic differences between these two ecotypes.

RT-PCR assays for genes responsive to cold
Seedlings of Ler-0 and Col-0 were grown on 1/2MS medium at 22℃ under controlled conditions (16 h light / 8 h dark photoperiod) for 9 d. Total RNA was extracted by using RNeasy Plant Kit (Qiagen, Hilden, Germany) with DNase I (Takara, Dalian, China), and reverse transcription was carried out according to the manufacturer's protocol of Moloney Murine Leukemia Virus (M-MLV) Reverse Transcriptase (Promega, USA). Six cold responsive genes were selected for RT-PCR analysis ( Table 2).
Germination of Col-0 and Ler-0 under different concentrations of ABA The seeds of Col-0 and Ler-0 were sowed on 1/2MS medium with different concentration of ABA (0, 0.25, 0.5, 1 and 2 μM) (Lefebvre et al. 2006), pre-processed at 4℃ for 2 days and then transferred to a growth chamber to germinate at 22℃ under dark. Germination rate was counted after 24 h of incubation.
Germination was determined according to the appearance of the radicle protrusion, observed under a microscope.

Assay for mortality rate and injury of seedlings under freezing conditions
For cold stress assay, seeds of Col-0 and Ler-0 were grown on 1/2MS medium pre-processed at 4℃ for 2 days and then transferred to a growth chamber to germinate at 22℃. The seedlings were then exposed to -20℃ for 2 h. Mortality rate and cold injury of seedlings were recorded at each 0.5 h. Relative electrolyte leakage (EL) was determined following the method described by Dong et al. (2009).

Measurement of petiole and cell lengths
Seedlings of Ler-0 and Col-0, grown on 1/2MS medium at 22℃ under controlled conditions (16 h light / 8 h dark photoperiod) for 7 days, were used as materials for measuring the cell length. The epidermis of petioles of cotyledons and the uppermost quarter of hypocotyls were peeled and then sealed in a glass slide by water. The slides were observed under a Leica DMRB DIC uorescence research microscope (Leica, Germany) and the cell length was determined using the micrometer. For measurement the length of petioles of true leaves, the 7d-old seedlings from 1/2MS medium were transplanted to soil and grew for 3 weeks in a growth chamber at 22℃ under the same photoperiod as described above. The fth and sixth leaves were used to measure the length of petioles.

Results
Assembly and analysis of the assembled RNA-Seq reads With the help of the Arabidopsis thaliana genome, six packages of RNA-Seq Reads downloaded from NCBI Sequence Read Archive were mapped and assembled. The lowest ratio of aligned reads (73.6%) is from the Col-0 root sample and lowest ratio of uniquely aligned reads is 65.71% of Ler-0 roots (Table 1).
So, the quality of RNA-seq data (sequenced by Illumina Genome Analyzer IIx, a relative lower version of Illumina series) should be high enough to support our further analyses. In general, Col-0 and Ler-0 presented similar expression pro les as showed by analysis of the seedling pair in Figure 1A and 1B. However, we found some differentially expressed genes between Col-0 and Ler-0. Exactly, totally 671 differently expressed genes were found (P<0.05) from the seedling pair of Col-0 and Ler-0 (Supplemental Data Set 1). Among them, 273 genes were up-regulated and 398 genes were down-regulated in Ler-0 than in Col-0 (Supplemental Data Set 1). Meanwhile, a heatmap displayed the expression pro les of genes from seedlings, roots and oral buds of Col-0 and Ler-0 ( Figure 1C). Hierarchical Clustering analysis showed that samples from the same organs of different ecotypes were gathered together, but different organs from the same ecotype did not ( Figure 1C). This result suggests that the differences on gene expression among different organs were larger than those between ecotypes Col-0 and Ler-0 of Arabidopsis thaliana.
To dig out the functional information of differential expressed genes, the differential expressed genes were classi ed by Gene ontology (GO) annotations with similar functionalities (Supplemental Data Set 2, Supplemental Data Set 3). As for the seedling pair, among the 671 differential expressed genes, 273 genes are expressed higher in Ler-0 and 398 genes are expressed lower in Ler-0 (Supplemental Data Set 1); 43 genes are functionally related to the response to cold with 25 genes up-regulated (including COR15B, COR15A, ICE1, RBGA6 and COR27 etc.) and 18 genes down-regulated in Ler-0 (including WRKY33, ECS1, DAR4 and WRKY70 etc.) (Figure 2, Supplemental Data Set 3); 31 genes related to the response to abscisic acid stimulus with 20 genes are up-regulated and 11 genes are down-regulated in Ler-0 (Figure 2, Supplemental Data Set 3). To evaluate the reliability of our analysis, we randomly selected 6 genes from the 31 genes belonging to "response to cold" group ( Figure 2

, Supplemental Data
Set 3) and examined their expression using semi-quantitative RT-PCR method. And all the tested genes showed expression differences similar to the RNA-seq analysis results (Figure 3).
Col-0 is more tolerant to freezing than Ler-0 To validate whether the GO enrichment analysis could provide useful clues for phenotypes, we rst compared the sensitivity of Col and Ler-0 to cold and freezing stress. Based on the previous analysis of transcriptomic sequencing data, we hypothesized that Ler-0 is more resistant to low temperatures. However, experimental results showed that Ler-0 was more sensitive to low temperatures. We rst treat the two ecotypes under -0℃ for different hours (1 h, 2 h, 3 h, 4 h, or 5 h), there was no apparent phenotypical difference was detected. After exposure to -20℃ for 1.5 h, it was signi cantly showed different mortality rate comparing of the two ecotypes ( Figure 4A-4C), the mortality rate of Ler-0 was 2.5 times more than that of Col-0 and the electrolyte leakage (EL) value of Ler-0 was also about 2 times than that of Col-0 ( Figure 4D). However, when the exposure time was increased to 2 hours, approximately 90% of the seedlings from both ecotypes were killed.
Col-0 is more resistant to ABA in seed germination than Ler-0 Because 31 differential expressed genes between Col-0 and Ler-0 seedlings were clustered in the group of genes responding to abscisic acid (ABA) and 20 out of the 31 genes had higher expression levels in Ler-0 ( Figure 2, Supplemental Data Set 3), we further examined whether ABA treatment would result in different effects on the seed germination of these two ecotypes. Finally, our results showed that ABA treatments exactly result in different seed germination rates between Col-0 and Ler-0. Under 2 μM ABA treatment, germination rate of Col-0 was approximately 60% while the seed germination rate in Ler-0 was reduced to about 20% ( Figure 4E-4G).
Epidermal cells in hypocotyls and petioles of Col-0 are longer than those of Ler-0 Besides GO, KEGG analysis could also provide some useful information for prediction or explanation of phenotypes. Our KEGG analysis indicated that some plant hormones, such as brassinosteroid, could modulate genes involving in plant growth, cell elongation etc. (Yu et al. 2011; Figure 5A). 58 differential expressed genes between Col-0 and Ler-0 were classi ed as the genes responding to hormone stimulus (Supplemental Data Set 3). Among them, 28 genes were expressed higher and 30 genes were expressed lower in Ler-0 than in Col-0 (Supplemental Data Set 3). These differential expressed genes may partly account for the developmental and morphological differences between these two ecotypes. Within the 58 genes, TCH4, which was predicted to affect cell elongation and plant growth, exhibited differential expression between these two ecotypes (Supplemental Data Set 3) (Bergonci et al. 2014;Shinohara et al. 2017;Thussagunpanit et al. 2017). Thus, petiole and cell elongation were examined to verify whether phenotype is consistent with differential gene expression. Figure 5 shows that the petioles of the fth and sixth leaves of Col-0 were much longer than that of Ler-0, and the cells of petioles and hypocotyls of Col-0 were also longer than that of Ler-0 under the same growth conditions ( Figure 5B-E5).

Discussion
In this paper, six transcriptomes were clustered (Table 1) and 671 genes were found to express signi cantly different (P < 0.05) between the seedlings of Col-0 and Ler-0 ( Fig. 1A-1B), and the differences on genes expressed among organs were larger than those among ecotypes of Arabidopsis thaliana (Fig. 1C). Thereby, gene expression pro les de ning tissue and cell speci city must be a critical regulatory event during plant growth and development. Obvious phenotypic differences have been reported between Ler-0 and Col-0, while little is known about the molecular basis (Alonso-Blanco et al. 2000). With the rapid advances on high throughput Next Generation sequencing technologies, it has now become possible to study the gene expression pro les of different ecotypes or genetic mutants and build up genotypephenotype associations.
RNA-seq is a high throughput sequencing technology that can be used for gene expression, mRNA splicing, new transcripts discovery, SNP discovery, SSR mining, and transcripts assembly, etc. Although it is a most advanced sequencing technology, it is also critical to explore available methods for the downstream analysis of the sequences besides generating the sequence data (Chen et al. 2013;Ding et al. 2013;Haas et al. 2013;Trapnell et al. 2010;Wolf 2013;Babarinde et al. 2019). Unfortunately, although a number of RNA-seq datasets have been generated, maximal utilization of these datasets for gene functional analysis and translation of genotypes to phenotypes has fallen far behind. Rich RNA-seq resources provide an ample opportunity to explore new functions of the genes of interest. For example, unknown phenotypes could be predicted from the RNA-seq data of genetic mutants by employing genotype to phenotype prediction, and experimental veri cation of these phenotypes would greatly help understanding the function of the mutant genes. This approach, of course, can also be used to predict phenotypical differences between ecotypes and elucidate the molecular basis of the phenotypical changes. In this work, we explored the possibility of utilizing the publicly available RNA-seq data from Arabidopsis ecotypes to predict possible phenotypes based on the differential gene expression. Our experimental results veri ed that genotype to phenotype prediction is a feasible and e cient way to explore new phenotypes and perhaps linking the phenotypes with gene expression, thus providing better understanding of plant developmental processes and response to environmental cues.
The set of methods in this study were built up based on Gene Ontology (GO). With the help of AgriGO, functional annotations and enrichment of the genes were analyzed. AgriGO was a web-based tool for experimental biologists identifying enriched Gene Ontology (GO) terms, and GO enrichment analysis could nd which GO terms were over-represented using annotations for that gene set (Du et al. 2010). We analyzed the RNA-seq data according to Haas et al (2013), and all signi cantly differential expressed genes between the seedlings of Col-0 and Ler-0 were clustered and enriched by GO analysis (Supplemental Data Set 2). Our RNA-seq analysis suggested that there could be differences in the sensitivity to abscisic acid (ABA) and cold between Ler-0 and Col-0 (Fig. 2). Some genes response to cold stress were up-regulated in Ler-0 (including some cold response marker genes like COR15B, COR15A, ICE1 and etc.) Tang et al. 2020), these datas suggested that Ler-0 should be more tolerance to cold stress than Col-0. But our experimental results showed Ler-0 was more sensitive to cold stress. Similarly, Ler-0 was more sensitive to ABA stress also (Fig. 4). These results indicate that the RNA-seq analysis datas could give a clue to predict phenotypes, but the nal phenotypic outcome still needs to be determined by experiment.
Our work again supports the idea that comparative transcriptome analysis could lead to the discovery of new phenotypes and new gene functions. RNA-seq analysis also predicted other phenotypical differences including defense response (GO:0006952), response to water deprivation (GO:0009414) and hyperosmotic salinity response (GO:0042538) (Supplemental Data Set 1). The differences between Col-0 and Ler-0 on disease resistance, responses to water de cit and osmotic stress were also experimentally veri ed (Parker et al. 1993;Meyre et al. 2001).
In conclusion, we have successfully employed a strategy of genotype to phenotype prediction and experimental veri cation to elucidate the possible molecular basis behind the phenotypes. We have proved the feasibility of this method with the steps described as follows: Firstly, the RNA-Seq data of candidate subjects (ecotypes or mutants) is obtained; then all the signi cantly differential expressed genes (p < 0.05) are analyzed by gene ontology (GO); use of GO enrichment to report the key genes to predict the phenotypic differences; nally, phenotyping to verify the prediction based on differential gene expression. KEGG or other bioinformatics tools and online databases would be helpful for every step. This approach would be a very useful data mining method, and could provide clues for phenotyping and explore the molecular mechanisms of phenotypic traits of organisms or genetic mutants. Abbreviations KEGG, Kyoto Encyclopedia of Genes and Genomes; RNA-Seq, RNA sequencing; SRA, Sequence Read Archive; Col-0, Columbia-0; Ler-0, Landsberg erecta-0; GO, gene ontology; ABA, abscisic acid.