Drought-Responsive Genes in Tomato: Meta-Analysis of Gene Expression Using Machine Learning

doi:10.21203/rs.3.rs-3215460/v1

Download PDF

Article

Drought-Responsive Genes in Tomato: Meta-Analysis of Gene Expression Using Machine Learning

https://doi.org/10.21203/rs.3.rs-3215460/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 08 Nov, 2023

Read the published version in Scientific Reports →

You are reading this latest preprint version

Plants have diverse molecular mechanisms to protect themselves from biotic and abiotic stressors and adapt to changing environments. To uncover the genetic potential of plants, it is crucial to understand how they adapt to adverse conditions by analyzing their genomic data. We analyzed RNA-Seq data from different tomato genotypes, tissue types, and drought durations. We used a time series scale to identify early and late drought-responsive gene modules and applied a machine learning method to identify the best responsive genes to drought. We demonstrated six candidate genes of tomato viz. Fasciclin-like arabinogalactan protein 2 (FLA2), Amino acid transporter family protein (ASCT), Arginine decarboxylase 1 (ADC1), Protein NRT1/PTR family 7.3 (NPF7.3), BAG family molecular chaperone regulator 5 (BAG5) and Dicer-like 2b (DCL2b) were responsive to drought. We constructed protein-protein interaction networks to identify their potential interactors and found them drought-responsive. The identified candidate genes can help to explore the adaptation of tomato plants to drought. Furthermore, these candidate genes can have far-reaching implications for molecular breeding and genome editing in tomatoes, providing insights into the molecular mechanisms that underlie drought adaptation. This research underscores the importance of the genetic basis of plant adaptation, particularly in changing climates and growing populations.

Biological sciences/Computational biology and bioinformatics/Computational models

Biological sciences/Computational biology and bioinformatics/Gene ontology

Biological sciences/Computational biology and bioinformatics/Machine learning

Abiotic stresses

Molecular breeding

Machine learning

Responsive genes

Adapted crops

The growing global population and the worsening effects of climate change present a formidable challenge to the field of agriculture. To meet the increasing demand for food, agricultural scientists must find ways to increase production despite the adverse effects of climate change. One of the major factors that can reduce crop yields is water scarcity, which is a common problem in many regions, such as South Asia, where prolonged summers lead to low water availability and moisture content in the soil (Hayat et al., 2008). The prolonged stress can severely diminish plant growth and productivity, and cause the accumulation of reactive oxygen species (ROS) in the plant. Lower carbon fixation due to stomata closure can lead to reduced NADP + regeneration through the Calvin cycle, which, when coupled with changes in chlorophyll production, can result in an increased production of ROS in water-stressed plants (Biehler et al., 1996; Zhang et al., 2022). The accumulation of ROS can lead to a cascade of harmful effects, including peroxidation of lipids, protein oxidation, enzymatic activity inhibition, oxidative damage to nucleic acids, and ultimately, cell death. These consequences can have serious implications for agriculture, as they can ultimately lead to decreased crop yields and food shortages. Therefore, finding effective ways to mitigate the impact of water scarcity on crops is crucial to ensure food security in the face of climate change.

Tomatoes are a widely cultivated crop throughout the world, originating in southern America as a cultivated solanaceous plant known as Lycopersicon esculentum. Despite its origins, tomatoes are now produced and consumed worldwide, with demand slightly outstripping production (Bai et al., 2007). Due to their relatively short life cycle, easy cultivation, and simple genetics with lower gene duplication, tomatoes are considered to be a standard model of vegetables (Bergougnoux, 2014). Molecular studies of tomatoes can also provide valuable insights into other related species (Kimura & Sinha, 2008). While tomatoes are economically important, there is still much to learn about their molecular responses to various abiotic stresses, such as water stress.

Next-generation sequencing technologies have rapidly become the preferred method for characterizing and quantifying entire genomes. One powerful application of these high-throughput sequencing methods is RNA sequencing (RNA-Seq), which allows researchers to gain insight into the transcriptome of cells. Compared to previous approaches such as Sanger sequencing and microarray-based methods, RNA-Seq provides higher resolution and greater sensitivity for characterizing the dynamic nature of the transcriptome. In addition to quantifying gene expression, RNA-Seq data can facilitate the discovery of novel transcripts, identification of alternative splicing events, and detection of allele-specific expression. Recent advances in the RNA-Seq workflow, including improvements in sample preparation, library construction, and data analysis, have enabled researchers to uncover even more of the functional complexity of transcription (Kukurba & Montgomery, 2015).

The analysis of RNA-Seq data is a significant challenge due to its complex and high-dimensional nature. However, in recent years, machine learning has emerged as a successful approach in various biological contexts. Machine learning algorithms have demonstrated impressive efficiency in handling large datasets that possess characteristics such as noise, high dimensionality, and incompleteness. One notable advantage is that these algorithms require minimal assumptions about the underlying probability distributions and generation methods of the data. In contrast to traditional statistical approaches that focus on inference, machine learning methods prioritize prediction, providing greater flexibility to accommodate diverse data characteristics. Machine learning plays a crucial role in the analysis of biological data, particularly RNA-Seq data, due to its capability to handle nonlinearity. Linear models often fall short of capturing the intricate relationships present in biological datasets. Conversely, machine learning algorithms excel at capturing and utilizing the complex patterns inherent in such data. This ability enhances our understanding and interpretation of RNA-Seq data, enabling more accurate predictions and deeper insights into biological phenomena (Mahood et al., 2020).

In this research, we gathered transcriptomics data from various tomato genotypes subjected to different drought conditions. The experiment encompassed diverse genotypes and included six time points representing varying durations of drought, along with corresponding control groups. Our analysis focused on identifying genes that exhibited differential expression, allowing us to capture both early and late responsive genes to drought. To achieve this, we employed machine learning models, systematically exploring the dataset to uncover the genes most affected by drought in tomato plants. Using the identified genes, we constructed a protein-protein interaction (PPI) network and determined their targets as drought-responsive. Additionally, we observed that the chromosomal positions of these genes overlapped with previously reported drought-responsive Quantitative Trait Loci (QTLs) in tomato.

The study has also investigated differentially expressed genes and enriched Gene Ontology (GO) categories in tomatoes subjected to prolonged drought stress and subsequent rehydration. These studies revealed that down-regulated genes after drought stress were enriched in GO categories related to photosynthesis and cell proliferation, while upregulated genes belonged to GO categories more directly associated with stress responses (Iovieno et al., 2016; Alam et al., 2010; Kosová et al., 2011). However, a comprehensive examination of genetic responses in tomato plants to drought stress is still lacking. Therefore, the primary objective of our study was to employ machine learning techniques to investigate the responsive genes of tomato plants under prolonged water deficit conditions. Through our systematic approach, we aimed to identify the genes specifically responsive to this particular stress condition in plants.

Differentially expressed gene patterns

The expression of genes can vary over time in response to different stimuli, and the identification of DEGs can provide insights into the underlying biological processes. In this study, the expression of genes in a time series scale was analyzed, and the results were visualized in Figure 2.

The DEGs were compared with their individual controls, and the analysis revealed interesting patterns (Figure 3 and Figure 4). Specifically, it was observed that the most upregulated genes were found in the later stages of stress, specifically at 120 hours post-treatment (Figure 3). This finding suggests that the cellular stress response becomes more pronounced as the duration of the stress increases. Interestingly, some genes were also found to be upregulated in the early stages of stress and overlapped with those found in the later stages. This observation implies that the cellular stress response may involve immediate and delayed mechanisms. And in the medium stages of water deficit, a few genes were found to be significantly upregulated. Although the number of DEGs was lower in this stage, it is important to note that these genes may play important roles in the early stages of the stress response. This study provides valuable insights into the temporal dynamics of gene expression in response to stress. The identification of DEGs at different time points highlights the importance of considering the duration of stress when studying cellular responses. Additionally, the overlapping DEGs observed across different stages suggest that the cellular stress response involves a complex interplay of immediate and delayed mechanisms.

Gene ontology enrichment analysis

Gene ontology enrichment analysis is a powerful tool used to interpret the biological function of differentially expressed genes (DEGs) in a high-throughput manner. In this study, the functions of DEGs were predicted from GO and the results were visualized in Figure 5. The analysis revealed that a large proportion of DEGs (approximately 600 genes) were involved in organic substance biosynthetic processes and organonitrogen compounds. This finding suggests that the cellular stress response involves the production and modification of organic compounds, which play important roles in cellular metabolism and signaling pathways. Moreover, approximately 500 genes were found to be related to stress, indicating that the cellular stress response involves the activation of various stress response pathways. This finding is in agreement with previous studies that have shown that stress response genes are upregulated in response to various environmental stimuli. Additionally, more than 400 genes were found to respond to chemicals, highlighting the importance of chemical signaling in cellular responses. These genes may be involved in the synthesis, transport, and degradation of chemical signals, or they may play roles in downstream signaling pathways that are activated by chemical stimuli. Furthermore, approximately 200 genes were found to be related to translation and responded to osmotic stress. This finding suggests that osmotic stress may affect translation processes, which play important roles in the synthesis of proteins that are involved in cellular responses to stress. In summary, the gene ontology enrichment analysis conducted in this study provides valuable insights into the biological function of DEGs in response to stress. The identification of genes involved in various metabolic, signaling, and stress response pathways highlights the complexity of the cellular stress response and emphasizes the importance of considering multiple biological processes when studying stress responses.

Machine learning performance

In this study, we utilized gene expression values as features to train classification models using the XGBoost in R. XGBoost is a popular implementation of the gradient boosting algorithm, which combines multiple weak learners, such as shallow trees, into a strong one. To evaluate the performance of our models, we split our dataset into training and test sets. The accuracy of the models was measured using the area under the curve (“auc”) value, which is a commonly used metric for evaluating classification models. The results of our analysis, shown in Figure 6, demonstrate that the accuracy of the training and test sets are quite close to each other, indicating that our models are well-fitted. Furthermore, we observed that the accuracy of the models increased over iterations, suggesting that the models learned from the data and were able to improve their performance as they were trained with more data. The use of the XGBoost in our study is particularly beneficial due to its ability to handle high-dimensional datasets, such as those generated from gene expression studies. By incorporating multiple weak learners, the algorithm can effectively capture the complex relationships between gene expression values and biological outcomes, such as disease states or cellular responses to stress. In summary, our results demonstrate the effectiveness of XGBoost in classifying biological samples based on gene expression values. The close agreement between the accuracies of the training and test sets suggests that our models are robust and generalizable. The use of XGBoost has enabled us to extract meaningful insights from high-dimensional gene expression datasets, and its flexibility makes it a valuable tool for future studies in this area.

Feature importance

In this study, we employed a robust approach to determine the most influential genes that contribute to the classification of samples based on their gene expression values. Following the training and validation of our classification model using the XGBoost algorithm, we extracted the features with their corresponding importance scores. The importance score of a feature reflects its contribution to the classification of samples and thus provides valuable insights into the underlying biological mechanisms. Among the top six genes identified as candidate genes, we found that FLA2 gene had the highest importance score (Figure 7). This observation suggests that the expression of FLA2 gene is a critical factor in determining the classification of samples in our study. FLA2 gene encodes for fasciclin-like arabinogalactan protein 2, which is a cell surface protein involved in various cellular processes such as cell adhesion, growth, and differentiation. The high importance score of FLA2 gene in our analysis may indicate its involvement in the cellular stress response and thus may have potential implications in the development of stress-tolerant crops. It is worth noting that the other candidate genes identified in our analysis also play important roles in various cellular processes. Further investigation of these genes could provide valuable insights into the underlying biological mechanisms involved in stress responses. In conclusion, the identification of FLA2 gene as the most important gene in our study highlights its potential significance in stress responses. The use of the XGBoost algorithm has enabled us to extract meaningful insights from high-dimensional gene expression datasets, and its flexibility makes it a valuable tool for future studies in this area.

Protein-protein network

To gain a better understanding of the biological functions and interactions of the candidate genes identified in our study, we constructed protein-protein interaction networks for each gene using publicly available databases. The resulting network for each candidate gene is shown in Figure 8, with nodes representing genes and edges representing interactions between genes. In these networks, we observed that the candidate genes identified in our study were represented by the color red, indicating their high importance in abiotic stress response. We also observed that the targets of these candidate genes showed abiotic stress-related functions, suggesting their involvement in the cellular stress response. The protein-protein interaction networks provide valuable insights into the potential pathways and mechanisms involved in the cellular stress response. By identifying the interactions between genes, we can better understand the complex interactions and functions of genes in the network. This information could be further used to develop targeted interventions for improving stress tolerance in crops.

Overall, the protein-protein interaction networks of the candidate genes identified in our study highlight their potential significance in stress responses and provide a foundation for further investigation into their roles in stress tolerance (Supplementary 2).

Candidate genes overlapped with drought QTLs of Tomato

In this study, we aimed to explore the potential role of three candidate genes, FLA2, ASCT, and NPF7.3, in the response of tomato plants to drought stress. To investigate this, we compared the location of these genes with the previously identified QTLs associated with drought-related traits in tomato by Diouf et al. (2018).

Our analysis revealed that all three candidate genes overlapped with QTLs that were previously associated with drought stress, suggesting their potential involvement in the plant's response to water deficit conditions (Table 1). Specifically, the QTLs RIP, SSC, NFr, and FW were identified under drought stress for the traits of time to ripe, soluble solid content, number of fruit, and fruit weight, respectively.

The importance of our findings lies in the potential for identifying specific genes that contribute to the drought stress response of tomato plants. A better understanding of the genetic mechanisms involved in this response is crucial for developing crop varieties that can withstand environmental stressors such as water scarcity.

Our study suggests that FLA2, ASCT, and NPF7.3 could be promising candidate genes for further investigation in the context of drought stress in tomato plants. Future studies could focus on elucidating the specific molecular pathways through which these genes affect the plant's response to water deficit conditions. This could lead to the development of more targeted breeding programs that utilize these genes to develop drought-tolerant tomato varieties.

Table 1. Drought stress QTLs and overlapped candidate genes.

Transcripts	Gene	Physical Map(Mbp)	QTLs
Solyc07g045440.1.1	FLA2	58.6651	RIP3.1, SSC4.1, RIP1.1, NFr1.1
Solyc03g078150.3.1	ASCT	51.555	FW2.2, NFr2.2, SSC11.1, RIP3.1
Solyc01g080870.3.1	NPF7.3	80.0395	SSC1.1, NFr1.1

The abstract of our study highlights the crucial importance of gaining a thorough understanding of the genetic composition of tomato plants to improve their ability to withstand drought stress, especially given the significant global demand for this staple crop. To achieve this objective, we employed state-of-the-art machine learning techniques to identify the most responsive genes associated with prolonged water scarcity in tomato plants. Our investigation uncovered six promising candidate genes, with FLA2 emerging as the top gene, indicating its potential as a critical target for enhancing drought resistance in tomatoes. Moreover, the gene ontology enrichment analysis conducted in our study demonstrated that the functions of the identified candidate genes were closely linked to drought stress. These findings are consistent with previous research (Iovieno, et al. 2016), reinforcing the notion that these candidate genes play a pivotal role in imparting drought tolerance in tomato plants. Our study significantly advances our knowledge of the molecular mechanisms underlying the response of tomato plants to drought stress. The identification of FLA2 and other candidate genes, coupled with the functional analysis of their gene ontology, lays a robust foundation for future research focused on developing drought-resistant tomato varieties. Such research can help mitigate the negative impact of water scarcity on global food production, making our study highly pertinent and timely.

The findings of this study demonstrate that specific genes were upregulated in later stages of stress, while others overlapped with early stages, aligning with previous research on plant response to drought stress (Atkinson et al., 2013; López-Galiano et al., 2019). These results underscore the complexity of the molecular mechanisms involved in plant responses to stress and emphasize the need for a comprehensive understanding of these mechanisms that encompass multiple genetic pathways. Additionally, the identification of these genes on different chromosomes suggests that they may have distinct modes of action in responding to stress, underscoring the significance of investigating multiple genes and their interactions to gain a deeper understanding of the fundamental molecular processes. By shedding light on the intricate mechanisms that govern plant response to drought stress, this study contributes to advancing our knowledge in the field and lays a foundation for future research aimed at developing drought-tolerant plant varieties.

This article emphasizes the critical importance of identifying candidate genes involved in the response to abiotic stress, particularly drought stress, and understanding their interactions using protein-protein interaction networks. Our study focused on six genes (FLA2, ASCT, DCL2b, ADC1, NPF7.3, and BAG5), and the constructed protein-protein interaction networks offer valuable insights into the potential pathways and mechanisms involved in drought responses. This research provides a foundation for developing targeted interventions to enhance stress tolerance in crops. The study highlights the significant role of the identified candidate genes in the response to abiotic stress, especially drought stress, with these genes exhibiting functions related to the cellular stress response. Our findings are consistent with previous research (Muhammad et al., 2019), which identified various genes, including SlMAPK3, involved in drought stress responses in tomatoes. The overexpression of SlMAPK3 improved the tolerance of tomato plants to drought stress by regulating the expression of genes involved in drought stress responses. Therefore, our study makes a valuable contribution to ongoing research on drought stress tolerance in tomatoes. The identification of candidate genes involved in drought stress response and the construction of protein-protein interaction networks lay a foundation for further investigation into the molecular mechanisms involved in stress tolerance. These findings could be leveraged to develop new strategies for improving crop productivity and sustainability in the face of drought stress and other environmental stressors, ultimately benefiting global food security(Zhou et al., 2007; Kosmala et al., 2012).

The study has provided compelling evidence that FLA2, ASCT, and NPF7.3 genes overlap with QTLs associated with drought-related traits in tomato plants. In a recent study by Verinico et al. (2022), FLA2 was identified as drought-responsive, which further confirms the importance of this gene in the plant's response to water deficit conditions.

Developing crops that are more resilient to environmental stressors like water scarcity has become an urgent priority, given the growing global population and climate change challenges facing agriculture. By understanding the molecular pathways through which these candidate genes affect the plant's response to drought stress, researchers could develop targeted breeding programs to develop drought-tolerant tomato varieties.

The integration of bioinformatics and computational methods, particularly machine learning, has greatly advanced the field of plant stress response research. The study presented in this paper is a remarkable example of how such approaches can provide novel insights into the genetic mechanisms underlying plant responses to environmental stress (Auer et al., 2010). By identifying candidate genes and regulatory elements involved in stress response, machine learning can help develop crop breeding programs that enhance stress tolerance, ultimately contributing to addressing food security challenges (Rico-Chávez et al., 2022; Veronico et al., 2022).

The use of machine learning allows for the analysis of large amounts of genomic data and the identification of complex gene networks involved in stress response. This approach has enabled researchers to identify critical genetic targets and pathways that can be manipulated to enhance stress tolerance in crops. Such insights can be leveraged to develop crop varieties that can withstand harsh environmental conditions, including drought stress, which is one of the most significant challenges facing global food production (Rico-Chávez et al., 2022; Lopez et al., 2020).

Furthermore, machine learning can also help identify and prioritize the most promising candidate genes and regulatory elements for further experimental validation. This approach can significantly accelerate the breeding process by reducing the time and resources required to develop new crop varieties. Therefore, the integration of bioinformatics and computational methods, particularly machine learning, represents a powerful approach to identifying the genetic mechanisms underlying plant responses to environmental stress.

Such approaches have the potential to make significant contributions to addressing food security challenges, particularly in areas where drought stress is a significant constraint on crop production. The use of such methods can ultimately lead to the development of drought-resistant crop varieties, which is critical for ensuring global food security in the face of climate change and other environmental challenges. Therefore, the integration of bioinformatics and computational methods is a promising tool for future research aimed at developing crops that can withstand environmental stressors and ultimately improve global food production.

RNA Seq data

For drought stress, RNA Seq data of tomatoes were extracted from Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). The data was collected using the following keywords in GEO: "Solanum lycopersicum"[Organism] OR Tomato [All Fields]) AND ("drought response"[MeSH Terms] OR Drought stress[All Fields]) (Figure 1). RNA Seq data were downloaded by SRAToolkit. Each treatment has its corresponding controls. Data were sorted according to time series points of 3, 6, 48, 72, 96, and 120 hours of treatment respectively. There was also data for recovery samples (Supplementary 1).

Mapping

Reads were mapped against the coding sequence (CDS) of the Heinz 1706 genome. The latest annotation (ITAG4.1, https://solgenomics.net/) of the Heinz 1706 genome was used. Kallisto was used for mapping. 34688 genes were found after mapping. The abundance file contained transcript id, length, effective length, estimated counts, and TPM (Transcript per million). Transcript id estimated counts and TPM values of all samples were gathered in one tab-separated value file for further analysis. The estimated counts of all samples per time point were averaged and labeled as a single time point.

Gene expression analysis

Differentially expressed genes (DEGs) were analyzed using the Bioconductor edgeR package in R. Estimated counts value of all samples were taken to analyze differentially expressed genes. Data were normalized with library size. Dispersions of samples were calculated with glm approaches of the edgeR package by comparing the treatments with their respective controls. Exact tests were carried out to verify the significance of change gene expression and their log fold change values. Multiple testing corrections were done with the topTags function. It ranks genes by p-value. The threshold of considering differentially expressed genes was False discovery rate (FDR)<0.01. For upregulated genes, the logFC threshold was >1, and for downregulated genes, it was < -1. Visualizing of gene expression was done with heatmap using the Complexheatmap Bioconductor package of R. Gene expression was shown with genotype-specific, tissue-specific, stress-specific, heat level-specific, and sample-specific using row Annotation, bottom Annotation, and top Annotation functions of Complex heatmap package in R. Intersection and total set of genes of samples was visualized with an upset plot using R for both upregulated genes and downregulated genes. Some selected transcription factors from diversified families were collected in literature which are said to be drought-responsive genes. The expressions of that transcription factor were also analyzed across time series scales by comparing them with their respective controls.

Gene ontology analysis

Gene ontology enrichment analysis of significant genes of samples was carried out with the topGO package in R. The genome annotation format (GAF, ITAG4.1) file of tomato was collected from (Carolyn, 2021, https://doi.org/10.25739/zh2v-4p15) and a map file was created with the bash script. The map file contained two columns with tabs separated where the first column was GO terms and the second column was genes to respective GO terms with space-separated. The functions were considered only those who were involved in biological processes. Fisher's classic test was done to find significant levels of GO terms and their functions. The top 15 functions were selected as thresholds.

Candidate selection

We used a tree model with gradient boosting, XGBoost in R to train and test the models. We split the data for each species into training (80%) and testing (20%) sets. We used five-fold internal cross-validation to select the optimized hyperparameters. We tuned “nrounds” (number of trees), “colsample_bytree” (the proportion of features for constructing each tree), “subsamples” (the portion of training data samples for training each additional tree), and “eta” (shrinkage of feature weights to make the boosting process more conservative and prevent overfitting) in an XGBoost: classification model. We used the XGBoost-generated feature importance score that indicates how useful each feature was in the construction of the model.

Protein-protein interaction

To find out the likely interactions of candidate genes the STRING ((https://string-db.org) tool was used (Karimizadeh et al., 2019). For the search interface, parameters were set to full network type, confidence score >0.4, and more than 10 interactors. In the organism interface, it was restricted to Solanum lycopersicum. In the networks graph, the colored nodes display the proteins and the edges represent the interactions.

In this study, we demonstrated that some genes of tomato were significantly upregulated in later stages of water stress, while some other genes were also found in the early stages and overlapped with later stages. Our results suggest that some genes play distinct roles in the plant's response to drought stress. Therefore, a comprehensive understanding of these genes is crucial for developing strategies to improve crop resistance to stress. Furthermore, our machine learning approach resulted in identification of six candidate genes, with FLA2 having the highest importance score, provides a starting point for further research into the specific functions and mechanisms of the genes in the context of drought stress. The gene ontology enrichment analysis predicted that the functions of candidate genes are related to drought stress, indicating that they may play crucial roles in the plant's ability to cope with environmental stress. Our study suggests that FLA2, ASCT, and NPF7.3 could be promising candidate genes for further investigation in the context of drought stress in tomato plants. Overall, this study highlights the importance of understanding the genetic makeup of tomato plants to develop strategies to increase stress resistance and fruit quality. The findings of this study shed light on the genetic makeup of tomato plants and their response to prolonged water deficit, and environmental stress that can significantly impact crop yield and quality.

Author Contributions

A.K. designed the study. R.H.C and F.S.E. carried out the study and did the data analysis. R.H.C drafted the manuscript.. P.K.J., S.D.G., R.A. and T.I. edited the manuscript. M.H.R., M.A.R.B., and A.K. supervised the work. M.H.R. managed the partial fund. A.K. and T.I. validated the results and provided the resources. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

The study was partially supported by the Noakhali Science and Technology University Research Cell (NSTU/RC-AG-02/MS/22/18).

Conflicts of Interest

Abul Khayer has moved to Computomics. The other authors declared that there is no conflict of interest.

Data availability

The source of all data analysed during this study are included in the supplementary files.The codes for the current study available from the corresponding authors on reasonable request.

Bergougnoux V. The history of tomato: from domestication to biopharming. Biotechnology advances. 2014;32(1):170-89. doi: 10.1016/j.biotechadv.2013.11.003.
Kosová K, Vítámvás P, Prášil IT, Renaut J. Plant proteome changes under abiotic stress—contribution of proteomics studies to understanding plant stress response. Journal of proteomics. 2011;74(8):1301-22. doi: 10.1016/j.jprot.2011.02.006.
Alam I, Sharmin SA, Kim KH, Yang JK, Choi MS, Lee BH. Proteome analysis of soybean roots subjected to short-term drought stress. Plant and Soil. 2010;333:491-505. doi:10.1007/s11104-010-0365-7.
Iovieno P, Punzo P, Guida G, Mistretta C, Van Oosten MJ, Nurcato R, et al. Transcriptomic changes drive physiological responses to progressive drought stress and rehydration in tomato. Frontiers in plant science. 2016;7:371. doi: 10.3389/fpls.2016.00371.
Kosmala A, Perlikowski D, Pawłowicz I, Rapacz M. Changes in the chloroplast proteome following water deficit and subsequent watering in a high-and a low-drought-tolerant genotype of Festuca arundinacea. Journal of experimental botany. 2012;63(17):6161-72. doi:10.1093/jxb/ers265.
Rico-Chávez AK, Franco JA, Fernandez-Jaramillo AA, Contreras-Medina LM, Guevara-González RG, Hernandez-Escobedo Q. Machine learning for plant stress modeling: A perspective towards hormesis management. Plants. 2022;11(7):970. doi:10.3390/plants11070970.
Zhou J, Wang X, Jiao Y, Qin Y, Liu X, He K, et al. Global genome expression analysis of rice in response to drought and high-salinity stresses in shoot, flag leaf, and panicle. Plant molecular biology. 2007;63:591-608. doi: 10.1007/s11103-006-9111-1.
Kimura S, Sinha N. Tomato (Solanum lycopersicum): a model fruit-bearing crop. Cold Spring Harbor Protocols. 2008;2008(11):pdb-emo105.
Bai Y, Lindhout P. Domestication and breeding of tomatoes: what have we gained and what can we gain in the future?. Annals of botany. 2007;100(5):1085-94. doi: 10.1093/aob/mcm150.
Karimizadeh E, Sharifi-Zarchi A, Nikaein H, Salehi S, Salamatian B, Elmi N, et al. Analysis of gene expression profiles and protein-protein interaction networks in multiple tissues of systemic sclerosis. BMC Medical Genomics 2019, 121, 1-12. doi: https://doi.org/10.1186/s12920-019-0632-2.
Osakabe Y, Osakabe K, Shinozaki K, Tran LS. Response of plants to water stress. Frontiers in plant science. 2014;5:86. doi:10.3389/fpls.2014.00086.
Atkinson NJ, Lilley CJ, Urwin PE. Identification of genes involved in the response of Arabidopsis to simultaneous biotic and abiotic stresses. Plant physiology. 2013;162(4):2028-41. doi: 10.1104/pp.113.222372.
Auer PL, Doerge RW. Statistical design and analysis of RNA sequencing data. Genetics. 2010;185(2):405-16. doi:10.1534/genetics.110.114983.
Kukurba KR, Montgomery SB. RNA sequencing and analysis. Cold Spring Harbor Protocols. 2015;2015(11):pdb-top084970. doi:10.1101/pdb.top084970.
Hayat S, Hasan SA, Fariduddin Q, Ahmad A. Growth of tomato (Lycopersicon esculentum) in response to salicylic acid under water stress. Journal of plant interactions. 2008;3(4):297-304. doi: 10.1080/17429140802320797.
Biehler K, Fock H. Evidence for the contribution of the Mehler-peroxidase reaction in dissipating excess electrons in drought-stressed wheat. Plant physiology. 1996;112(1):265-72. doi: 10.1104/pp.112.1.265.
López-Galiano MJ, García-Robles I, González-Hernández AI, Camañes G, Vicedo B, Real MD, et al. Expression of miR159 is altered in tomato plants undergoing drought stress. Plants. 2019;8(7):201. doi: 10.3390/plants8070201.
Diouf IA, Derivot L, Bitton F, Pascual L, Causse M. Water deficit and salinity stress reveal many specific QTL for plant growth and fruit quality traits in tomato. Frontiers in Plant Science. 2018;9:279. doi: 10.3389/fpls.2018.00279
Muhammad T, Zhang J, Ma Y, Li Y, Zhang F, Zhang Y, et al. Overexpression of a mitogen-activated protein kinase SlMAPK3 positively regulates tomato tolerance to cadmium and drought stress. Molecules. 2019;24(3):556. doi: 10.3390/molecules24030556.
Veronico P, Rosso LC, Melillo MT, Fanelli E, De Luca F, Ciancio A, et al. Water stress differentially modulates the expression of tomato cell wall metabolism-related genes in meloidogyne incognita feeding sites. Frontiers in Plant Science. 2022;13:817185. doi:10.3389/fpls.2022.817185.
Zhang T, Wang Y, Ma X, Ouyang Z, Deng L, Shen S, et al. Melatonin alleviates copper toxicity via improving ROS metabolism and antioxidant defense response in tomato seedlings. Antioxidants. 2022;11(4):758. doi: 10.3390/antiox11040758
Mahood EH, Kruse LH, Moghe GD. Machine learning: a powerful tool for gene function prediction in plants. Applications in Plant Sciences. 2020;8(7):e11376. doi: 10.1002/aps3.11376

No competing interests reported.

Supplementary.zip

Download PDF

Journal Publication

published 08 Nov, 2023

Read the published version in Scientific Reports →

Editorial decision: Major revision
12 Sep, 2023
Reviews received at journal
09 Sep, 2023
Reviewers agreed at journal
30 Aug, 2023
Reviewers invited by journal
28 Aug, 2023
Editor assigned by journal
21 Aug, 2023
Editor invited by journal
20 Aug, 2023
Submission checks completed at journal
20 Aug, 2023
First submitted to journal
29 Jul, 2023

You are reading this latest preprint version

Drought-Responsive Genes in Tomato: Meta-Analysis of Gene Expression Using Machine Learning

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Results

Discussion

Methods and Materials

Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1