Experimental materials
In recent years, we have carried out interspecific hybridization between the female Nymphaea ‘Peter Slocum’ and male N. micrantha for three consecutive years, aiming at transferring the color gene of male parent to female parent. However, we did not obtain seeds, so we carried out a thorough and systematic study from the aspect of plant reproductive biology, and found that the main reason for the failure of the hybrid combination was the low compatibility between pollen and stigma before fertilization [3]. Therefore, in this study, an interspecific cross between the female ‘Peter Slocum’ and male N. micrantha was performed. Our aim was to further reveal the reasons of low compatibility between pollen and stigma at the molecular level on the basis of previous studies. In addition, self-pollinated ‘Peter Slocum’ and N. micrantha were successful. ‘Peter Slocum’ has a chromosome number of 84 (2n = 6x), whereas N. micrantha has a chromosome number of 56 (2n = 4x) [3]. These plants were grown in ponds in Xingxiang, Zhenjiang, Jiangsu Province, China.
Previous studies showed that pollen began to germinate at 2 HAP, and abnormal growth of pollen tubes was observed at 6 HAP [3]. The number of pollen tubes did not change at 6 and 12 HAP, indicating that the related genes in stigma had been expressed and new proteins were synthesized, while no new proteins were synthesized after 6 h. For this reason, the stigmas of 0, 2, and 6 HAP were collected. Each treatment had three biological repeats. The stigmas of 0 HAP (fresh male parent pollen was placed on stigma) were used as the control, and the stigmas from 2 and 6 HAP were used as the treatment to dynamically study the interaction between the pollen and stigma after pollination. After collection, the three samples, the stigmas of 0 HAP, the stigmas of 2 HAP, and the stigmas of 6 HAP, were immediately frozen in liquid nitrogen and stored at - 80°C.
High-throughput RNA-seq and data processing
Total RNA was extracted using Trizol reagent according to the manufacturer’s protocol (Takara Bio Inc., Otsu, Japan). The total RNA was checked for quality and quantity using an Agilent 2100 bioanalyzer (Agilent Technologies, CA, USA). The mRNA samples were enriched by oligo(dT) magnetic beads and then cut into fragments with fragmentation buffer at 80°C. First-strand cDNA was synthesized using random hexamers as the primers. Second-strand cDNA was synthesized from first-strand cDNA using RNase H, DNA polymerase I, and dNTPs. After purification and terminal repair, double-stranded DNA poly A and adaptor sequences were ligated to the end of the cDNA. cDNA libraries were constructed by PCR amplification after selecting for fragment size and undergoing a quality check with an Agilent 2100 Bioanalyzer system. Finally, the qualified cDNA libraries were sequenced with an Illumina HiSeq 2500 system. Three biological replicates were used in the RNA-seq experiments involving each sample, and sequenced as separate libraries.
High-quality clean reads were obtained by removing adaptor sequences, duplicated sequences, ambiguous reads (“N”), and low-quality reads. Transcriptomes were separately assembled de novo using Trinity (http://trinityrnaseq.sourceforge.net/). Clean reads with a certain overlap length were initially combined to form long fragments without N (named contigs). Related contigs were clustered using TGICL software (Pertea etal. 2003) to yield unigenes (without N) that could not be extended on either end, and redundancies were removed to acquire nonredundant unigenes [59]. The functional annotation of all-unigenes was performed using a BLAST search (http://blast.ncbi.nlm.nih.gov/Blast.cgi) against the GO, Pfam, KEGG, Nr, and Swiss-Prot databases.
The FPKM (fragments per kilobase per million mapped reads) method eliminated the influence of different gene lengths and sequencing levels on gene expression calculation. FPKM values were directly used to compare gene expression differences between samples. The DESeq package was used to obtain a “base mean” value for identifying DEGs. FDR ≤0.01 and an absolute value of log2ratio≥1were set as thresholds for significance of gene expression differences between the two samples.
Functional annotation and classification of the DEGs were conducted using the Blast 2 GO program (http://www.blast2go.com/b2ghome) [60]. Additionally, a KEGG pathway analysis (http://www.genome.jp/kegg-bin/search_pathway) was performed. The heat map was produced using Cluster 3.0 and treeview.
Label-free analysis of the stigma proteome of water lily
Protein extraction and peptide enzymolysis. Protein extraction was performed using the SDT (4% SDS, 100 mM DTT, 150 mM Tris-HCl pH 8.0) method. The protein concentrations were quantified with the BCA Protein Assay Kit (Bio-Rad, USA), and the samples were stored at -80°C. Protein digestion (200 mg for each sample) was performed using the filter aided proteome preparation procedure described by Wisniewski [61]. The peptides from each sample were desalted on C18 cartridges, concentrated by vacuum centrifugation, and reconstituted in 40 mL of 0.1% (v/v) formic acid.
MS/MS protein identification and quantification. Each fraction was injected for nanoLC-MS/MS analysis. The peptide mixture was loaded onto a reverse-phase trap column (Thermo Fisher Scientific Acclaim PepMap100, 100 mm × 2 cm, nanoViper C18) connected to the C18-reverse phase analytical column (Thermo Fisher Scientific Easy Column, 10 cm long, 75 mm inner diameter, 3 mm, C18-A2) in buffer A (0.1% formic acid) and separated with a linear gradient of buffer B (84% acetonitrile, 0.1% formic acid) at a flow rate of 300 nL/min controlled by IntelliFlow technology.
LC-MS/MS analysis was performed on a Q Exactive mass spectrometer (Thermo Fisher Scientific) coupled to an Easy nLC (Proxeon Biosystems, now Thermo Fisher Scientific) for 240 min. The mass spectrometer was operated in the positive ion mode. MS data was acquired using a data-dependent top10 method, dynamically choosing the most abundant precursor ions from the survey scan (300–1,800 m/z) for higher collision energy dissociation (HCD) fragmentation. The automatic gain control target was set to 1e6, and the maximum injection time was 50 ms. The duration of dynamic exclusion was 60.0 s. Survey scans were acquired at a resolution of 70,000 at 200 m/z; resolution for HCD spectra was set to 17,500 at m/z 200, the isolation width was 2 m/z, the normalized collision energy was 30 eV, and the under-fill ratio was defined as 0.1%.
For protein identification, the MS raw files were processed by Maxquant1.5.3.17 software [62]. The acquired MS/MS spectra were searched against the predicted protein databases translated from the above transcriptome databases in this study. The minimum peptide length was set to six amino acids and the maximum FDR was set to 1% for both peptides and proteins. The other parameters were set as follows: peptide mass tolerance = ± 20 ppm; enzyme = trypsin; max missed cleavage = 2; fixed modification: carbamidomethyl (C); variable modification: oxidation (M), acetyl (protein N-term). Protein quantification was based on both ‘razor’ and unique peptides [62-63], and the label free quantitation algorithm was performed [64]. For each fraction, peptides were matched across different LC-MS/MS runs based on mass and retention time (set to the match between runs option in MaxQuant) using the time window of 2 min.
DEPs were analyzed for significant downregulation or upregulation. For quantitative changes, a 2.0-fold cutoff was set to determine upregulated and downregulated proteins, with a p-value < 0.05 present in at least two replicates.
Bioinformatics analysis. Three biological replicates of each sample were used in the label-free analysis. To further explore the impact of differentially expressed proteins on different biological processes, enrichment analysis was performed. GO term and KEGG pathway enrichment analyses were performed based on the Fisher’s exact test, considering the whole quantified protein annotations as the background dataset. Benjamini-Hochberg correction for multiple testing was further applied to adjust derived p-values [65]. Only functional categories and pathways with adjusted p-values < 0.05 were considered as significant.
Correlation analyses of transcriptome and proteome profiles
A gene and its corresponding protein are considered to be correlated in compared stages if both the gene and protein were expressed at these stages. Next, the significance of the expression between the compared stages for both correlated transcripts and proteins was determined. If both a gene and its corresponding protein both showed significant difference in their expression levels between the compared stages, they were defined as cor-DEGs-DEPs. Gene ontology (GO) term annotation, KEGG pathway annotation, and functional enrichment analysis were then conducted for cor-DEGs-DEPs, and the Spearman correlation coefficient was calculated.
Selection of target peptides for PRM analysis
Peptide mixtures of nine samples were prepared using trypsin as described above for the label-free analysis. Equivalent peptides from each sample were pooled, and 2 µg of the pooled sample was introduced into an HPLC system via a trap column (100 μm×50 mm, 5 μm-C18) and then via an analytical column (75 μm×200 mm, 3 μm-C18). Separated peptides were then analyzed using a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Raw files were analyzed using Maxquant 1.5.3.17 software (enzyme = trypsin/P, missed cleavage = 0). Only peptides with scores over 40 were selected as target peptides.
Quantitative PRM analysis for target proteins
A total of eight target peptides of the four DEPs were selected and used for quantitative analysis to determine their feasibility. Peptide Retention Time Calibration mixture was added into the peptide mixture, and the labeled peptide “TASEFDSAIAQDK” (the bold “K” indicates the heavy isotopic labeling) was used as the internal standard. Two micrograms of peptide mixture containing 20 fmol labeled peptide was separated by HPLC and then analyzed by a Q-Exactive mass spectrometer. Quantitative analysis was repeated three times and the raw data was calculated by Skyline 3.5.0.