Sampling information and ethics statement
Collection localities and treatments for all samples are given in Supplementary Table 1. The collecting permit and all study protocols were approved by the Animal Care and Ethics Committee of Kunming Institute of Zoology, Chinese Academy of Sciences, and were conducted in strict accordance with the guidelines for Animal Care and Use at the Kunming Institute of Zoology (SMKX-20160301-03).
To exclude potential influences of sex differences and developmental stages, we only collected male adults during their reproductive periods for all the studied frogs. Captured frogs were housed in artificial ponds with sufficient food at appropriate temperature. At least three individuals with identical treatment were sampled, and the mean snout-vent lengths of each species (N. parkeri: 4.300 ± 0.2 cm; N. phrynoides: 8.2 ± 0.4 cm; Q. spinosa: 8.7 ± 0.2 cm; mean ± SD) were measured using a digital Vernier caliper to ensure they were of similar age. After the frogs were sacrificed, their dorsal skin was isolated rapidly for subsequent analyses.
UV exposure experiments and skin samples collection
To set the UV dosage used in the experiments, we obtained illumination photometry measurements of local habitats of the studied species using an illuminometer (TES-1339, TAIWAN) and then referred to previous research (11) based on the proportion between elevation and illumination. We used a dose of ~3.7mW/cm2 UV exposure in experiments, which is slightly higher than the daily illuminance of local conditions that N. parkeri experiences (~3.5mW/cm2). In the pilot assesment of tolerance to UV, we exposed individuals of studied frogs to a continuous ~3.7mW/cm2 UV light until a demise was observed. At the 8 h of UV exposure, we observed the first demise in N. phrynoides and serious skin peeling in Q. spinosa, whereas individuals of N. parkeri showed no behavioral signs of distress and exhibited only mild skin peeling. Hence, continuous UV exposure (up to 8 h) at a dose of ~3.7mW/cm2 was established for use in the time-course UV exposure experiments.
In the UV-exposure experiment, three to four male adults of N. parkeri and its lower-elevation relatives were washed by ddH2O and set in a plastic box (20 x 15 x 15 cm) filled with 200ml ddH2O. Then we exposed individuals (n = 27 for N. parkeri and N. phrynoides, n = 20 for Q. spinosa) to UV light at a dose of ~3.7mW/cm2 following a time course (0, 0.5, 1, 2, 4, 6 and 8 h). Frogs in the same box were treated with identical exposure time (at a time point). 50 μL ddH2O samples containing skin secretions were collected separately from the groups from 0 h to 8 h at intervals of 10 minutes to evaluate their ABTS+ free radical-scavenging activities. Dorsal skin samples of individuals were collected separately from same position for the subsequent identification of metabolic compounds, histological analysis, and mRNA and miRNA sequencing.
Statistical analysis of changes to skin histology
To examine skin structure, we collected dorsal skin samples near the foramen magnum in a 1 cm x 1 cm square immediately after the frogs were sacrificed and subjected them to hematoxylin-eosin (HE) staining procedures (11). The skin tissues were fixed in 10% buffered formalin overnight and subsequently preserved in 70% ethanol for storage under ambient conditions. The tissues were then embedded in paraffin. Histological sections with 6 μm thickness were removed on positively charged slides, which were then stained with the HE solution. All the tissue samples were examined and photographed in a blinded manner. Images of each histological section were captured by a Leica DM4000B microscope (Leica, Heidelberg, Germany) at ×100 or ×200 magnification. All the skin morphological characteristics, including melanin content and the proportion of necrotic areas, were quantified based on images of HE stained sections with Pro-Plus® (IPP) version 6.0 (http://www.mediacy.com/imageproplus). All the images used to statistics are given in Supplementary Data1.
After capturing images of histological sections at ×200 magnification, at least 3 images of each individual were stochastically selected to quantify damage degree and melanin content. Hence, a minimum of 54 images from each species (from 6 time points of UV exposure experiment) were used for statistical analysis. For each histological image, the proportion of hollowed-out areas within the overall epithelial area was calculated to measure the degree of damage (%). For melanin content assessment, chromocytes were identified by comparing the integrated optical density and the background. The mean integrated optical density (MIOD) (59) for each image was measured to represent melanin content in skin. After quantification of skin structures, a two-tailed t-test was used for significance testing.
Measurement of ABTS+ free radical-scavenging activities
The collected solutions containing skin secretions were centrifuged at 12,000rpm, 4°C for 10 minutes, and then the supernatants were lyophilized and stored at -80°C until use. A 2, 2’-azino-bis (3-ethylbenzothiazoline-6-sulfonic acid) (ABTS) scavenging test (60) with some modification was adopted to evaluate the antioxidant activity of the samples. Briefly, a stock solution of ABTS radical (Sigma-Aldrich, USA) was prepared by incubating 2.8 mM potassium persulfate (Sigma-Aldrich, USA) with 7 mM ABTS in water for at least 6 hours in the dark, after which it was used immediately. The stock solution was diluted 50-fold with ddH2O, and then 50 μL samples were added to the diluted stock solutions and kept from light for 30 minutes, with a blank control of same volume of ddH2O. Vitamin C (Sigma-Aldrich, USA) dissolved in H2O was used as the positive control. The decrease in absorbance at 415 nm indicated the antioxidant activity of the samples. The rate of free radical scavenging (%) was calculated as follows: (Ablank–Asample) × 100/Ablank.
Metabolite identification and quantification
Each skin sample (100 mg) was ground with liquid nitrogen and homogenate was resuspended with prechilled 80% methanol and 0.1% formic acid by well vortex. The samples were incubated on ice for 5 min and then were centrifuged at 15,000 g, 4°C for 20 min. Some of supernatant was diluted to final concentration containing 53% methanol by LC-MS grade water. Then supernatant was subsequently transferred to a fresh Eppendorf tube and w centrifuged at 15000 g, 4°C for 20 min. Finally, the supernatant was injected into the LC-MS/MS system analysis (61).
UHPLC-MS/MS analyses were performed using a Vanquish UHPLC system (ThermoFisher, Germany) coupled with an Orbitrap Q ExactiveTMHF-X mass spectrometer (Thermo Fisher，Germany) in Novogene Co., Ltd. (Beijing, China). Samples were injected onto a Hypesil Gold column (100×2.1 mm, 1.9μm) using a 17-min linear gradient at a flow rate of 0.2mL/min. The eluents for the positive polarity mode were eluent A (0.1% FA in water) and eluent B (methanol).The eluents for the negative polarity mode were eluent A (5 mM ammonium acetate, pH 9.0) and eluent B (methanol).The solvent gradient was set as follows: 2% B, 1.5 min; 2-100% B, 12.0 min; 100% B, 14.0 min; 100-2% B, 14.1 min; 2% B, 17 min. Q ExactiveTMHF-X mass spectrometer was operated in positive/negative polarity mode with spray voltage of 3.2 kV, capillary temperature of 320 °C, sheath gas flow rate of 40 arb and aux gas flow rate of 10 arb.
The raw data files generated by UHPLC-MS/MS were processed using Compound Discoverer 3.1 (CD3.1, ThermoFisher) to perform peak alignment, peak picking, and quantitation for each metabolite. The main parameters were set as follows: retention time tolerance, 0.2 minutes; actual mass tolerance, 5ppm; signal intensity tolerance, 30%; signal/noise ratio, 3; and minimum intensity, 100,000. After that, peak intensities were normalized to the total spectral intensity. The normalized data was used to predict the molecular formula based on additive ions, molecular ion peaks and fragment ions. And then the peaks were matched against the mzCloud (https://www.mzcloud.org/), mzVault and Mass List database to obtain the accurate qualitative and relative quantitative results. Statistical analyses were performed using the statistical software R (version R-3.4.3), Python (v 2.7.6) and CentOS (CentOS release 6.6).
The metabolites were annotated using the KEGG (Kyoto Encyclopedia of Genes and Genomes) database (https://www.genome.jp/kegg/pathway.html), HMDB database (https://hmdb.ca/ metabolites) and LIPIDMaps database (http://www.lipidmaps.org/). Principal components analysis (PCA) and Partial least squares discriminant analysis (PLS-DA) were performed at metaX (a flexible and comprehensive software for processing metabolomics data). We applied univariate analysis (two tailed t-test) to calculate the statistical significance (P-value). The metabolites with P-value < 0.05 and fold change (FC) ≥ 2 or ≤ 0.5 were considered to be differentially expressed metabolites (DEMs). The functions of these metabolites and metabolic pathways were studied using the KEGG database (Supplementary Fig. 2).
Genome sequencing, assembly and annotation
One female N. parkeri from Lhasa, Tibet Autonomous Region (~3,697m) of China was prepared for DNA isolation (Supplementary Table 1). After euthanasia, liver, muscle, heart, lung, brain, skin, kidney, and ovary tissues were deprived and stored in liquid nitrogen. Briefly, genomic DNA for SMRT bell library construction was extracted from liver with QIAGEN® Genomic DNA extraction kit. Total RNA was extracted from each of the above tissues using TRIzol reagent (Invitrogen Corp., Carlsbad, CA) and RNeasy Mini Kit (Qiagen, Chatsworth, CA).
For long-read sequencing, SMRTbell libraries with fragment size of 20 kb were constructed with SMRTBell template preparation kit 1.0 (Pacific Biosciences, Menlo Park, CA, USA). The libraries were sequenced by the PacBio Sequel system (Pacific Biosciences) with 25 SMRT cells. This generated 237.27 Gb of PacBio long-read data (21.25 million reads) with N50 > 17 Kb. These data were de novo assembled to obtain a preliminary N. parkeri genome (approximately 2.46Gb) with Falcon (62) and contig N50 was 2.32 Mb. To estimate genome size and perform error correction of the assembled genome, short-read sequencing data were obtained based on a paired-end library with short insert sizes of about 350 bp. Raw reads were produced using BGISEQ-500 platform, with read lengths of 2×150 bp. SOAPnuke (63) software was used to filter adapter and low-quality data.
To obtain a chromosome-level assembly, the Hi-C technique was applied to capture genome-wide chromatin interactions. Genomic DNA in muscle samples was fixed with formaldehyde in a concentration of 1% to allow cross-linking of cellular protein. Three Hi-C libraries were prepared following the standard Hi-C library protocol. Then we used BGISEQ-500 platform to sequence the libraries with paired-end 100 bp reads. Same to short-read sequencing, SOAPnuke was used to filter adapter and low-quality data, and then the Hi-C reads were evaluated and further filtered by HiC-Pro (64). Approximately 394.87 Gb clean data were generated.
For transcriptome sequencing, a single pooled RNA sample was prepared by mixing equal volumes of the RNA extracted from each tissue. Total RNA was synthesized to the first-strand cDNA using Clontech SMARTer PCR cDNA Synthesis Kit. After PCR Optimization, a large-Scale PCR was performed to synthesize second-strand cDNA. After another large-Scale PCR, the DNA was used for SMRTbell library construction. Qualified sequencing data were obtained by sequencing this library using PacBio Sequel platform. To obtain consensus full-length isoforms, we performed SMRT analysis through Reads of insert, Classify, Cluster, and Quvier.
To estimate the genome size and heterozygosity of the N. parkeri genome, a k-mer depth frequency distribution analysis (with k=17) of the short-read sequencing data was performed. After polishing the primary assembly using both PacBio long-read data and BGISEQ short-read data with Pilon (65), we got a high quality contig level genome of N. parkeri. The finally total length of the N. parkeri genome was 2.47 Gb, with a contig N50 of 2.34 Mb (Supplementary Table 2). We used BUSCO (66) to evaluate the finally obtained contig-level assembly. This analysis showed that 93.5% of the orthologous genes were retrieved in the assembly, including 87.9% complete genes and 5.6% fragmented genes (Supplementary Table 3). To perform chromosome-level assembly of the genome based on chromatin conformation capture technology, Lachesis (67) was used to cluster, order, and orient the contigs based on valid reads of Hi-C. Finally, we obtained 13 pseudochromosomes with a genome size of 2.47 Gb and scaffold N50 of 268.57 Mb, which is the first chromosome-level high-elevation amphibian genome (Supplementary Tables 4 and 5, Supplementary Fig. 4).
De novo-based, homology-based, and RNA sequence-based methods were used to annotate the positions and structures of protein-coding genes. De novo prediction was performed using Augustus (68). For homology-based prediction, homologous gene sets of the protein gene sets from Homo sapiens, Gorilla gorilla, Macaca mulatta, Mus musculus, Rhinopithecus roxellana were searched in the N. parkeri genome with BLAST+ (69), and then gene structure was identified using GeneWise (70). For RNA sequences-based methods, the full-length transcriptome sequences from PacBio Sequel platform were mapped to the N. parkeri genome for gene structure prediction using GMAP (71). Finally, genes predicted from the above methods were integrated by EVidenceModeler (EVM) (72) into a non-redundant consensus of gene sets. Five different public protein databases, including SwissProt, TrEMBL (73), KEGG (74), Gene Ontology (GO) (75), and NR (NCBI non-redundant protein sequence database) (76), were selected for gene functional annotation of N. parkeri using Blast+ (69). The integrated methods yielded a nonredundant gene set composed of 22,884 protein-coding genes (accounting for 99.58% of total genes) in the N. parkeri genome (Supplementary Table 6), with an average exon number per gene of 7.78. The average transcript length, average coding sequence (CDS) length, average exon length and average intron length were 31,131 bp, 1,382 bp, 177 bp, and 4,381 bp, respectively (Supplementary Table 7).
We annotated the repetitive sequences in the N. parkeri genome using both the homology prediction method and the de novo prediction method, accounting for 58.90% of the genome. First, RepeatMasker and RepeatProteinMask (77) were used based on RepBase TE library (https://www.girinst.org/repbase/). As a result, most of the repeated sequences were transposable elements (TEs), occupying 55.20% (1.36 Gb) of the assembly (Supplementary Table 8). Then, non-coding RNAs (including microRNAs) were identified through aligning the N. parkeri genome to the Rfam database (78) with default parameters. The alignments with an E-value < 10-6 were considered statistically significant and their corresponding genomic regions were annotated as conserved non-coding RNA genes. Infernal (79) was used to filter raw results to obtain the final non-coding RNAs. These non-coding RNAs were further categorized into microRNA (miRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), small nuclear RNA (snRNA) (Supplementary Table 9). The raw sequences for the whole genome assembly and annotation information have been deposited in the Genome Sequence Archive (GSA) of National Genomics Data Center (NGDC) at https://ngdc.cncb.ac.cn/gsa/ (accession: CRA004362). The whole-genome assembly data have been deposited in the Genome Warehouse (GWH) of NGDC at https://bigd.big.ac.cn/gwh under accession number GWHBCKU00000000. The annotation files can be found in Figshare at https://figshare.com/projects/Genomic_data_of_Nanorana_parkeri/116061.
Library construction and transcriptome sequencing of skin samples
To interpret how defense-related genes change in expression with ongoing UV exposure, we collected dorsal skin tissue of three to four individuals at each time point (in the time-course of UV exposure) in each species for mRNA-seq. Total RNA was extracted using TRIzol reagent (Takara) and RNeasy Mini Kit (Qiagen, Valencia, CA). RNA purity was assessed using kaiaoK5500®Spectrophotometer (Kaiao, Beijing, China), the integrity and concentration was assessed using RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA, USA). A total of 2 μg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext® UltraTM RNA Library Prep Kit for Illumina® (#E7530L, NEB, USA) following the manufacturer’s recommendations. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). First strand cDNA was synthesized using random hexamer primer and RNase H. Second strand cDNA synthesis was subsequently performed using buffer, dNTPs, DNA polymerase I and RNase H. The library fragments were purified with QiaQuick PCR kits and elution with EB buffer, then terminal repair, A-tailing and adapter added were implemented. RNA concentration of library was measured using Qubit® RNA Assay Kit in Qubit® 3.0 to preliminary quantify and then dilute to 1 ng/μl. Insert size was assessed using the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA), and qualified insert size was accurate quantification using StepOnePlusTM Real-Time PCR System (Library valid concentration>10 nM). The clustering of the index-coded samples was performed on a cBot cluster generation system using HiSeq PE Cluster Kit v4-cBot-HS (Illumina) according to the manufacturer’s instructions. After cluster generation, the libraries were sequenced on an Illumina platform and 150 bp paired-end reads were generated. Sequencing reads that contained low-quality, adaptor-polluted, or high content of unknown base reads were removed.
de novo transcript assemblies of N. phrynoides and Q. spinosa
For N. phrynoides and Q. spinosa, for which reference genomes were lacking, clean reads from unexposed group were merged for de novo assembly using Trinity (80) with default settings (--min_contig_length 150 --min_kmer_cov 3 --min_glue 3 --bfly_opts ‘-V 5 –edge-thr=0.1 –stderr). And then the reference transcript was produced by eliminating redundant contigs upon assembly with CD-HIT (81). To produce longer and more complete consensus sequences, the TGICL pipeline was used to cluster and assemble sequences with CAP3 (82). TransRate (83) was used to evaluate the quality of the reference transcript based on common statistical indexes, including number of contigs, mean contig length, N50 values, and GC% (Supplementary Table 10). The clean reads of mRNA-seq of studied species in this paper are given in the GSA of NGDC under the accession number CRA004830.
A modified reciprocal best hit method (version 0.6.9 “crb-blast”) (84) was used to identify orthologous genes between the transcript assemblies (N. phrynoides and Q. spinosa) and the choromosome-level genome of N. parkeri with high accuracy. All of the orthologs were aligned with FasParser (85). The coding region of each assembled contig was determined using TransDecoder (version 3.0.1 with “-m 50” option) (86). If more than one open reading frame (ORF) was predicted, the one with maximum similarity to the sequence in N. parkeri was chosen. Prank (87) and Gblocks (88) were used to align each gene into a consensus 1:1 orthologous gene set (Supplementary Data3) for subsequent evolutionary analyses. To reduce false positives, alignments with a final length of less than 120 bp were discarded. Because a high mean identity ratio (MIR) is assumed to indicate a accurate orthology prediction, we estimated the MIR for each alignment to evaluate the quality of our orthologous gene set.
Expression abundance calculation
For N. parkeri with a chromosome-level genome, all of the clean reads obtained from mRNA-seq were mapped to the reference genome useing HISAT2 (89) (--phred33 --sensitive --no-discordant --no-mixed -I 1 -X 1000). SAMtools (90) was used to sort and convert the SAM files to BAM files. Then, StringTie (91) (-f 0.3 -j 3 -c 5 -g 100 -s 10000 -p 8) was used to calculate transcript expression levels. A Python script (prepDE.py) was used to extract read count information from the files generated by StringTie. For N. phrynoides and Q. spinosa, clean reads were mapped to the corresponding reference transcript using Bowtie2 (92) (-q --phred64 --sensitive --dpad 0 --gbar 99999999 --mp 1,1 --np 1 --score-min L,0,-0.1 -I 1 -X 1000 --no-mixed --no-discordant -p l -k 200). We calculated the expression (FPKM) of each transcript with RSEM (93) for each individual. A heatmap of gene expression was drawn using the eGPS (94) based on Z-scores-normalized FPKM values.
Temporal clustering of genes from time-course expression profiles
We obtained a gene expression profile along with a UV-exposure timeline for each species. To trace the changing pattern of gene expression, we used Short Time-series Expression Miner (STEM) software (version 1.3.12) (31) to classify significant temporal groups. Before implementing STEM, we calculated the median gene expression value at that time point and normalized the median value with the log2 function. The normalized expression dataset was used as the input for STEM. The default parameters with “Log normalize data” as the data info and “STEM Clustering Method” as the clustering method, were selected. The cluster results from STEM (Supplementary Fig. 6) indicated four major temporally clustered groups of genes in N. parkeri based on changes in their expression patterns: genes with gradual decreases in expression (Clustering Profile #10), genes with gradual increases in expression (Clustering Profile #39), genes with early-phase (at around 0.5 h of time-course) upregulation (Clustering Profiles #44 and #43) and genes with later-phase (at around 8 h of time-course) upregulation (Clustering Profiles #27 and #28). The genes showing gradual decreases and increases in expression were rendered seperately to display the results of functional enrichment analysis with Metascape (95).
Identification of co-expressed genes in temporal groups
To identify the critical genes showing early-phase upregulation and later-phase upregulation, the WGCNA package (32) was used to identify coexpressed genes that were strongly related to specific temporal groups (with options “unsigned correlation” and “minimum cluster size = 30”). The Benjamini-Hochberg method was used to correct for multiple testing when calculating the correlation between modules and time points (Supplementary Fig. 7). The genes that were highly correlated with a module (MM > 0.6, P-value < 0.05) were considered as co-expressed genes in the temporal groups. GO enrichment analyses on genes within each of these temporal groups were performed using Metascape. The gene list and correlation with modules, as well as respective GO terms in each temporal group can be found in Supplementary Tables 3–6.
miRNA sequencing, annotation and expression calculation
Conserved miRNAs that were predicted in silico could be masked as transposable elements (TEs) due to sequence similarity between them. In addition, novel miRNAs in N. parkeri could easily be missed. To confirm whether there are novel miRNAs that may have evolved as a UV defense in N. parkeri, we performed miRNA-seq analysis of the same individual samples (n = 19 for N. parkeri; n = 24 for N. phrynoides; n = 14 for Q. spinosa) used for mRNA-seq. RNA extraction and quality assessment were performed following the above protocols. Total RNA was separated by 15% agarose gels to extract the small RNA (18-30 nt). After precipitation by ethanol and centrifugal enrichment of the small RNA sample, the library was prepared according to the method and process of Small RNA Sample Preparation Kit (Illumina, RS-200-0048). The primary steps of this procedure are as follows: (1) Connection of the 3' adaptor to the separated small RNA. (2) Connection of the 5' adaptor to the separated small RNA. (3) RT-PCR. (4) Recycling strips of 145-160bp (22-30nt RNA). The RNA concentration of the library was measured using a Qubit® RNA Assay Kit in Qubit® 2.0 to quantify and then dilute to 1ng/μl. Insert size was assessed using the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA), and after the insert size was found to be consistent with expectations, insert size was accurately quantified using Taqman fluorescence probe of AB Step One Plus Real-Time PCR system (Library valid concentration > 2nM). The qualified libraries were sequenced using an Illumina platform to generate 50 bp single-end reads. Sequencing reads that contained low-quality, adaptor-polluted, or high content of unknown base reads were filtrated. The 3' adapters were clipped from the sequencing reads. Reads with the same sequence and those with a length shorter than 15 bp after clipping the 3' adapter were discarded. The clean reads of miRNA sequencing of studied species in this paper are given in the GSA of NGDC under the accession number CRA005235.
The filtered reads were mapped to the corresponding genome (N. parkeri) or reference transcripts (N. phrynoides and Q. spinosa) using Bowtie (-best -strata -v 0 -k 100) (96) and then analyzed with miRDeep2 (97) to annotate mature miRNAs and precursors. The annotation was based on known miRNAs (from the available well annotated genomes including Xenopus laevis, X. tropicalls, Anolis carolinensis, Gallus gallus, Homo sapiens) curated in miRBase (release 22) (98) without any mismatches allowed. miRNAs with a read conts < 5 and a true positive probability < 60% were excluded from downstream analyses. This analysis resulted in two categories of known and novel miRNAs. To exclude sequencing noise falsely classified as known or novel miRNAs, we also predicted the reliable miRNAs for each individual using the miRNA-seq data. In this screen, if miRNA expression was detected in at least two of the three (or four) individuals in one time point of UV exposure, we considered them as reliable miRNAs. The numbers of known and novel miRNAs identified in each species are listed in Supplementary Data 8.
The miRNAs of each species were mapped to the respective genome or transcripts for abundance quantification. Some miRNA reads were counted multiple times when miRDeep2 assigned them to mature miRNAs, and some were mapped to multiple genomic loci. Therefore, the normalization of miRNA reads was performed by dividing the reads quantified by miRDeep2 with the total miRNA reads quantified by Bowtie. We used the normalized data as our miRNA expression data, and log2-transformed data (Supplementary Data 9) were used for further analyses.
miRNA target prediction
mRNA and miRNA expression data from the same individual were combined as the input data for WGCNA to identify co-expressed miRNAs and mRNAs in the early-phase temporal group and later-phase temporal group. miRNAs showed a significant negative correlation (MM < 0, P-value < 0.05) with a temporal group were selected as co-expressed miRNA candidates (Supplementary 10). To identify co-expressed miRNAs in gradually increasing group and gradually decreasing group, expression profiles of the total miRNAs were further clustered using STEM (Supplementary Fig. 11). In the N. parkeri STEM results, the miRNAs in Clustering Profile #10 and Clustering Profile #39 were separately considered candidates for the gradually increasing and gradually decreasing temporal groups. The mRNA targets of all the co-expressed miRNAs were predicted with TargetScan (version 7.1) (99) with the parameter “Total context++ score < -0.1”. A miRNA may have many potential target genes. To identify which miRNAs explicitly function in regulating the expression of genes involved in the UV defense systems of N. parkeri, we further compared changes in the expression patterns of a miRNA and its predicted mRNA targets. Only those miRNAs that showed a significant negative correlation (showing expression inhibition as a function) with their predicted targets in a specific temporal group were retained. These miRNAs were considered modulators of gene expression in UV defense systems. The correlation of miRNAs and target mRNA in expression was qualified by Spearman’s rank correlation coefficient test in R (Supplementary Table 11).
Based on the phylogenetic relationships of the studied species (46), codeml was used to estimate dN, dS, and dN/dS values with the free ratio model (“model=1”). An improved branch-site model (Null hypothesis: model = 2, NSsites = 2, fix_omega = 1, omega =1; Alternative hypothesis: model = 2, NSsites = 2, fix_omega = 0, omega = 1) in PAML (47) was used to identify genes showing positive selection along the N. parkeri branch, which was assigned as the foreground branch. A likelihood-ratio test (LRT) was used to compare the observed positive selection to a null model, and the corresponding P-value was calculated. Positively selected sites were deduced through Bayesian empirical Bayes (BEB) analysis.
In vitro synthesis of tyrosinase and confirmation of its activity
To investigate the activity of tyrosinase, we separately synthesized DNA sequences encoding the N. parkeri and Q. spinosa tyrosinase protein sequence in vitro (see Supplementary Fig. 13 for synthesized protein expression). We inserted the synthesized tyrosinase sequences into PET-32a bacterial plasmids, which were then transferred into BL21 competent cells (TSV-A09, TSINGKE, Kunming, China). The bacterial strains transfected with the recombinant plasmid were inoculated in LB liquid medium resistant to ampicillin (A610028, Sangon Biotech, Shanghai, China), and cultured at 37℃ with agitation 180 rpm to OD600 = 0.6. The recombinant fusion tyrosinases were isolated and purified under 4℃. Then, the tyrosinase was expressed with 200mM CuSO4 (A600063-0500, Sangon Biotech, Shanghai, China) and 100mM IPTG (I6148, Macklin, Shanghai, China) induction at 28℃ with shaking (100rpm) for 10 hours, and purified under 4℃ by Ni-NTA agarose column (30230, QIAGEN, Valencia, CA, USA), according to the instruction. With the buffer (15μM CuSO4 50mM Na2HPO4, pH 6.2), the tyrosinase solution was reacted with the substrate L-DOPA (D9628-5G, Sigma-Aldrich, MO, American) at concentrations of 0.625mM, 1.5mM, 2.5mM, 5mM and 10mM. The absorbance at 475nm was measured with Microplate Reader (Epock, BioTek, VT, American) at 1-minute intervals, and monitored for 1 h. The dynamics of the tyrosinase catalysis are showed in a fitted curve generated with the Line weaver-Burk equation.
For experiments with replicates, the results are shown as mean ± s.d. with replicates from independent biological experiments. Two-tailed t-tests were executed using ‘t.test’ in R (parameters, alternative = ‘two.sided’; paired = TRUE). Spearman’s rank correlation coefficient test statistic was performed using the ‘cor.test’ function in R (parameters, method = ‘spearman’; exact = TRUE).
Data are available in the Genome Sequence Archive (GSA) (100) in National Genomics Data Center (NGDC) (101), China National Center for Bioinformation / Beijing Institute of Genomics (CNCB/BIG), Chinese Academy of Sciences (CAS) at https://ngdc.cncb.ac.cn/gsa/ (GSA accessions: CRA004362, CRA004830 and CRA005235), the Genome Warehouse (GWH) of NGDC at https://bigd.big.ac.cn/gwh (accession: GWHBCKU00000000) and Figshare (https://figshare.com/projects/Genomic_data_of_Nanorana_parkeri/116061).