Experimental materials
An interspecific cross between the female N.Peter Slocum’ and male N. micrantha was performed as described in a previous report[3]. Self-pollinated ‘Peter Slocum’ and N. micrantha were successful. In addition, ‘Peter Slocum’ has a chromosome number of 84 (2n = 6x), whereas N. micrantha has a chromosome number of 56 (2n = 4x) [3]. These plants were grown in ponds in Xingxiang, Zhenjiang, Jiangsu Province, China. These plants were collected from shanghai chenshan plant science research center, Chinese Academy of Sciences (Shanghai, China).
The stigmas of 0, 2, and 6 HAP were collected. Each treatment had three biological repeats. The stigmas of 0 HAP (fresh male parent pollen was placed on stigma) were used as the control, and the stigmas from 2 and 6 HAP were used as the treatment to dynamically study the interaction between the pollen and stigma after pollination. After collection, the three samples, the stigmas of 0 HAP, the stigmas of 2 HAP, and the stigmas of 6 HAP, were immediately frozen in liquid nitrogen and stored at - 80°C.
High-throughput RNA-seq and data processing
Total RNA was extracted using Trizol reagent according to the manufacturer’s protocol (Takara Bio Inc., Otsu, Japan). The total RNA was checked for quality and quantity using an Agilent 2100 bioanalyzer (Agilent Technologies, CA, USA). The mRNA samples were enriched by oligo(dT) magnetic beads and then cut into fragments with fragmentation buffer at 80°C. The first strand of cDNA was synthesized with random hexamer as primer. RNase H, dNTPs and DNA polymerase I were used to synthesize the second strand of cDNA using the first strand as template. After purification and end repair, adaptor sequences and double-stranded DNA poly A were attached to the ends of cDNAs. After selecting the fragment size and using Agilent 2100 biological analyzer for quality detection, the cDNA library was constructed by PCR amplification. Finally, the Illumina HiSeq 2500 system was used to sequence the qualified cDNA libraries. Three biological replicates were used in the RNA-seq experiments involving each sample, and sequenced as separate libraries.
Raw data (raw reads) of fasta format were firstly filtered using fastp software. In this step, clean data (clean reads) were obtained by removing reads containing adapter, reads containing ploy-N and low quality reads from raw data. At the same time, Q20, Q30 and GC content the clean data were calculated. All the downstream analyses were based on the clean data with high quality. The transcripts were assembled by Trinity (http://trinityrnaseq.github.io). Clean reads with a certain overlap length were initially merged into long fragments without N (called contigs). TGICL software was used to cluster the related contigs to obtain unigenes (without N) which could not be extended at both ends, and nonredundant unigenes were acquired after removing redundancies[59]. The functional annotation of all-unigenes was performed using a BLAST search (http://blast.ncbi.nlm.nih.gov/Blast.cgi) against the GO, Pfam, KEGG, Nr, and Swiss-Prot databases.
The unigenes obtained using Trinity were used as reference sequences, and the clean reads of each sample were compared to the reference sequences by using the default parameters of bowtie 2 in RSEM software. The results of bowtie2 comparison were statistically analyzed by RSEM, and the number of read counts of each sample compared to each gene was further obtained, and the number of read counts was converted to FPKM, then the gene expression level was analyzed. The FPKM (fragments per kilobase per million mapped reads) method eliminated the effect of gene length and sequencing depth on gene expression calculation. The difference of gene expression between samples was directly compared with FPKM values. The “base mean” value for DEGs identification was obtained by using the DESeq package. The absolute value of log2ratio≥1 and FDR ≤0.01 were used as the threshold of significant difference in gene expression between the two samples.
Functional annotation and classification of the DEGs were conducted using the Blast 2 GO program (http://www.blast2go.com) [60]. Additionally, a KEGG pathway analysis (https://www.genome.jp/kegg/) was performed. The heat map was produced using Cluster 3.0 and treeview.
Label free analysis of the stigma proteome of water lily
Protein extraction and peptide enzymolysis. Protein extraction was performed using the SDT (4% SDS, 100 mM DTT, 150 mM Tris-HCl pH 8.0) method. The protein concentrations were quantified with the BCA Protein Assay Kit (Bio-Rad, USA), and the samples were stored at –80°C. Protein digestion (200 g for each sample) was performed using the filter aided proteome preparation procedure described by Wisniewski [61]. The peptides from each sample were desalted on C18 cartridges, concentrated by vacuum centrifugation, and reconstituted in 40 L of 0.1% (v/v) formic acid.
MS/MS protein identification and quantification. NanoLC-MS/MS analysis was performed on each peptide fraction. We first loaded the peptide mixture onto reversed-phase trap column (Thermo Fisher Scientific Acclaim PepMap100, 100 μm × 2 cm, nanoViper C18) and connected it to C18-reversed phase analytical column (Thermo Fisher Scientific Easy Column, 75 μm inner diameter, 10 cm long, 3 μm, C18-A2) in buffer A (0.1% formic acid), and then separated it with the linear gradient of buffer B (0.1% formic acid, 84% acetonitrile) under the flow rate of 300 nL / min controlled by intelliFlow technology.
The LC-MS/MS analysis was carried out on the Q Exactive mass spectrometer (Thermo Fisher Scientific) coupled with a Easy nLC (Proxeon Biosystems, now Thermo Fisher Scientific) for 240 min. Mass spectrometer was operated under positive ion mode. The MS data was obtained through the data-dependent top10 method and dynamically selected the most abundant precursor ions from the measurement scan (300–1,800 m/z) to obtain higher collision energy dissociation (HCD) fragments. The automatic gain control target, the maximum injection time and dynamic exclusion duration was set to 1e6, 50 ms and 60.0 s respectively. Measurement scans were acquired at 70,000 resolution at 200 m/z; HCD spectra resolution was set to 17,500 at 200 m/z, the normalized collision energy was 30 eV, the isolation width was 2 m/z, and the under-fill ratio was 0.1%.
The MS raw files were dealed with using Maxquant1.5.3.17 software for protein identification [62]. In this study, we searched the acquired MS/MS spectra based on the predicted protein databases translated from the above transcriptome databases. The maximum FDR and the minimum peptide length was set to 1% and six amino acids for both peptides and proteins respectively. Other parameters were set as described below: enzyme = trypsin; max missed cleavage = 2; fixed modification: carbamidomethyl (C); peptide mass tolerance = ± 20 ppm; variable modification: oxidation (M), acetyl (protein N-term). Protein quantification was determined by unique peptides and ‘razor’ [62–63], and label free quantitation calculation was performed [64]. For each fraction, the peptide was matched in different LC-MS/MS operations based on the mass and retention time (setting matching options between runs in MaxQuant) using a two minute time window.
DEPs were analyzed as significantly up-regulated or down-regulated. For quantitative changes, a critical value of 2.0-fold was set to determine the up-regulated and down-regulated proteins, at least two replicates with a p-value of <0.05.
Bioinformatics analysis. Each sample was repeated three times in label free analysis. In order to study the effect of differentially expressed proteins on different biological processes, we carried out enrichment analysis of GO and KEGG. Go and KEGG were enriched by Fisher’s exact test, and the whole quantitative protein annotation was used as the background data set. Benjamini-Hochberg was modified by multiple testing, so that the derived p-values could be adjusted [65]. Only pathways and functional classifications with p-values < 0.05 were identified as significant.
Correlation analysis of transcriptomics and proteomics
In a comparison group, if a gene and its corresponding protein were expressed, it is considered that the gene and its corresponding protein were correlated. Next, we determined the significant expression of correlated genes and proteins in the comparison group. If a gene and its corresponding protein were significantly expressed in a comparison group, they were named cor-DEGs-DEPs. GO and KEGG enrichment analysis for cor-DEGs-DEPs were then conducted, and the Spearman correlation coefficient was calculated.
PRM analysis
Selection of target peptides for PRM analysis. Peptide mixture of the nine samples analyzed by label free analysis was prepared by trypsin. The equivalent peptides in each sample was collected, and 2 μg of the collected sample was introduced into the HPLC system through a capture column (5 μm-C18, 100 μm×50 mm) and an analysis column (3 μm-C18, 75 μm×200 mm). Next, the separated peptides were analyzed by Q-Exactive mass spectrometer (Thermo Fisher Scientific). Maxquant 1.5.3.17 software was used to analyze the raw files (missed cleavage = 0, enzyme = trypsin/P). The peptide with a score of more than 40 was considered to be the target peptide.
Quantitative PRM analysis for target proteins. We selected eight target peptides from the four DEPs for PRM quantitative analysis. We added the peptide retention time calibration mixture to the peptide mixture and used the labeled peptide “TASEFDSAIAQDK” (bold “K” for heavy isotope labeling) as the internal standard. We first separated 2 μg of peptide mixture with 20 fmol labeled peptides by HPLC, and then analyzed them by Q-Exactive mass spectrometer. Three times of repeated quantitative analysis were carried out, and raw data was calculated with skyline 3.5.0 software.