Meta-omics approaches reveal unique small RNAs exhibited by the uncultured microorganisms dwelling deep-sea hydrothermal sediment in Guaymas Basin

Small regulatory RNAs (sRNAs) are present in almost all investigated microbes, regarded as modulators and regulators of gene expression and also known to play their regulatory role in the environmentally significant process. It has been estimated that less than 1% of the microbes in nature are culturable in the laboratory, hindering our understanding of their physiology, and living strategies. However, recent big advancing of DNA sequencing and omics-related data analysis makes the understanding of the genetics, metabolic potentials, even ecological roles of uncultivated microbes possible. In this study, we used a metagenome and metatranscriptome-based integrated approach to identify small RNAs in the microbiome of Guaymas Basin sediments. Hundreds of environmental sRNAs comprising 228 groups were identified based on their homology, 82% of which displayed high similarity with previously known small RNAs in Rfam database, whereas, “18%” are putative novel sRNA motifs. A putative cis-acting sRNA potentially binding to methyl coenzyme M reductase, a key enzyme in methanogenesis or anaerobic oxidation of methane (AOM), was discovered in the genome of ANaerobic MEthane oxidizing archaea group 1 (ANME-1), which were the dominate microbe in the sample. These sRNAs were actively expressed in local Guaymas Basin hydrothermal environment, suggesting important roles of sRNAs in regulating microbial activity in natural environments.


Introduction
Small non-coding RNAs (sRNAs) are short untranslated transcripts (50-500 nucleotide [nt]) that are generally encoded in intergenic regions (IGRs) of the genomes. Most of the sRNAs act as regulators of the gene expression and could modulate expression both at the transcriptional and post-transcriptional level (Gottesman and Storz 2011). Many of them appeared to regulate environmentally significant processes including quorum sensing (Fu et  2019), iron homeostasis (Chareyre and Mandin 2018), amino acid and vitamin biosynthesis (Yang et al. 2017;Marinho et al. 2019), stress responses and photosynthesis (Brenes-Alvarez et al. 2016). Previous sRNA identification studies have mainly been carried out on and remain limited to a few model microorganisms, so little is known about their diversity and ecological relevance in environmental microbial communities of which more than 99% remain uncultivated (Lewis et al. 2010;Puspita et al. 2012;Solden et al. 2016). As most of the previous studies on sRNA discovery were limited to cultured microbes, environmental sRNAs derived from uncultured microbes largely remain unknown. Previous studies have reported sRNA identification in metagenomic data, deciphering that sRNAs are the abundantly existing type of RNA in almost all microorganisms. For example, Weinberg et al. reported that metagenomic samples contain diverse classes of environmental RNAs which previously remain unknown (Weinberg et al. 2009). In another study, they discovered more than hundred structured sRNA using comparative genomics-based approaches from metagenomic and genomic data, including twelve metabolite sensing sRNAs, three of which were experimentally validated (Weinberg et al. 2010). Metatranscriptomic data have also been used to identify the sRNAs, for example, Murakami et al. analyzed the metatranscriptomic data of oceanfront deep-subsurface hot spring and identified more than 2000 novel sRNA (Murakami et al. 2012). Similarly, Shi et al. reveal unique sRNAs in Ocean's water column through metatranscriptomic analysis (Shi et al. 2009). The discovery of the sRNAs from the various environments containing uncultured microbes is indispensable to get insights into their occurrence and functional diversity. To study the phylogenies of uncultured microorganisms and establishing their population dynamics, metagenomics has come forward as a powerful technique overcoming the need for culturing the microbes. Metagenomic research could reveal genomic sequences and provide diversity analysis of microbial species. Previously, metagenome analysis has been performed in different environments ranging from deep-sea hydrothermal vents to human gut (Chu et al. 2021;Zeng et al. 2021;Leng et al. 2022). A limitation associated with the metagenome research includes it does not provide any information regarding which genes are expressed under which conditions, therefore, could not reveal information about the functional activity of microorganisms and their adaptation mechanisms. Thanks to metatranscriptomic research, which could reveal gene expression profiles under different environmental conditions (Ranjan et al. 2021). Therefore, metagenomics and metatranscriptomics based integrated approaches could provide more insights into the microbial diversity, dynamics of their gene expression and the potential role of sRNAs in adaptation strategies of microbial species.
In this study, we use both the techniques for identifying active small regulatory RNAs in the Guaymas Basin hydrothermal environment. The Guaymas Basin is a semiclosed basin in the Gulf of California (Byrne and Emery 1960) characterized by the hydrothermally active environment that includes anoxic sediments seeps and vent plumes. High biological productivity in the water and terrigenous input from the Mexican mainland and Baja California lead to rapid sulfide-and hydrocarbon-rich, anaerobic sediments accumulation of approximately 400 m in thickness (Simoneit and Lonsdale 1982). Guaymas Basin represents unique features as compared to other hydrothermal vent sites as huge amounts of petroleum-like products, such as gasolinerange aliphatic and aromatic hydrocarbons, short-chain fatty acids, ammonia and residual polar asphaltic material are produced using pyrolysis. Based on high carbonate content, the vent fluid also exhibits unique chemistry, i.e. nearly neutral pH and release a vast quantity of methane (Welhan 1988). Anaerobic methane-oxidizing archaea (ANME) which are the known active group therein (Wang et al. 2019), undergo sulfate-dependent, anaerobic oxidation of methane (AOM) to maintain the global flux of methane (Kevorkian et al. 2021;Iasakov et al. 2022).
Here, we applied a comparative genomics approach to detect sRNAs in uncultured deep-sea microbes using their metagenome and metatranscriptome, and also focused the methane metabolizing archaeal group belonging to phylum Euryarchaeota, ANMEs. Almost all known anaerobic methane-metabolizing archaea utilize methyl coenzyme M reductase (MCR) as the key enzyme in methanogenesis and AOM processes (Wongnate et al. 2016). MCR is highly conserved, and in particular, the gene encoding the alpha subunit of the enzyme complex (mcrA) has been utilized as a molecular marker gene for the detection and phylogenetic investigation of methane/alkane metabolizing Archaea (Wang et al. 2020). A plethora of novel sRNAs in deep-sea microbes and their metagenome and metatranscriptome were identified in this study, particularly we reported the presence of a putative cis-regulatory RNA at the 5′-end of mcrA in ANME-1, which is likely regulating the expression of this significant gene, suggesting its important role in methane metabolism. Although, the putative cis-sRNA requires further experimental validation, however, this is the first computational discovery of sRNA related to methane metabolism. Furthermore, nearly complete genomes of ANME-2a and ANME-2d were scanned to detect sRNAs using two independent comparative genomics-based approaches.

Sample description, metagenome and metatranscriptome data collection
The sediment sample used in this study was collected by push cores at a deep-sea vent site in Guaymas Basin (Gulf of California, Mexico) in October 2008 during cruise AT15-25 on Alvin dive 4460 (latitude 27º 0.71238 N, longitude 111º 24.3237 W, depth 2016 m). The sediment core 10 was used in this study, it was oily and covered with a white microbial mat. The temperature was measured with the Alvin temperature probe and increased from ~ 7-10 °C at the surface to 31 °C at 6 cm sediment depth. Sediment layers at every 2 cm depth interval were sectioned and stored at −70 °C and the sediment layer at 4-6 cm below the surface was used in this study for DNA isolation (Wang et al. 2019).
SDS-based method was used for the DNA extraction from the sample and descriptions of metagenome library construction and sequencing has been reported earlier (He et al., 2016). Details of bioinformatics analysis performed on raw Hiseq metagenomic reads is given below. Descriptions of metatranscriptomic data including methods of RNA extraction, complementary DNA Synthesis, and Sequencing has been reported in the previous study (Wang et al., 2019). In detail, total RNA was extracted by an SDS-based method. Briefly, 0.5 g of subsampled sediment with six replicates was mixed with an equal weight of 0.1 mm diameter glass beads and 670 μl of extraction buffer (100 mM Tris-HCl, 100 mM Sodium EDTA, 100 mM Sodium Phosphate and 1.5 M NaCl, pH 8.0), and then homogenized with a tissue lyzer (Tissuelyser-48, Shanghai Jingxin, China) at 30 Hz for 30 s, with a 120 s interval for two cycles (4 °C). Fifty μl lysozyme (20 mg/ ml) and 10 μl proteinase K (20 mg/ml) were added and incubated for 30 min in a 37 °C water bath. After incubation, 70 μl of 10% SDS was added and incubated at 55 °C for 30 min, with gentle mixing every 10 min. The supernatant was collected after 10 000 × g centrifugation for 10 min at 4 °C and transferred into a 2 ml tube. The supernatant was mixed with an equal volume of Phenol: chloroform: Iso-amyl alcohol (P:C:I, 25:24:1, vol/vol) pH 5.7. The aqueous phase was retained by centrifugation and precipitated with 0.6 vol of isopropyl alcohol and 0.3 M sodium acetate (pH 5.2) at 4 °C for 30 min. The tube was centrifuged at 16 000 × g for 20 min at 4 °C and the supernatant was removed carefully to avoid loss. Finally, the nucleic acid pellets were washed with precooled 70% ethanol and resuspended in RNase free water. DNA was removed by TURBO DNA-free Kit (Ambion, Thermo Fisher Scientific). RNA was purified with the RNeasy MinElute Kit (Qiagen). DNA removal was confirmed by negative amplification of archaeal and bacterial 16S rRNA genes. Extracted total RNA was quantified by Qubit 3.0 fluorometer with RNA HS Assay Kit (Thermo Fisher Scientific). A total of 0.5 μg RNA were converted to double-stranded cDNA using random primers and the Maxima H Minus Double-Stranded cDNA Synthesis Kit (Thermo Fisher Scientific). Then, the purified DNA (4 μg in total) and reverse transcript cDNA from RNA (500 ng in total) samples were sent to the Beijing Genomic Institute (BGI) for paired-end sequencing by the Illumina HiSeq 2000 sequencing platform.

De novo assembly and binning of metagenomic sequences
The raw shotgun sequencing metagenomic reads were trimmed with Sickle (https:// github. com/ najos hi/ sickle) using the default parameters. Trimmed, paired-end DNA reads were assembled using IDBA-UD (Peng et al. 2012) with the following parameters: mink 52, maxk 92, step 8. Binning of assembled metagenomic sequences was initially performed using tetra-nucleotide frequencies signatures in ESOM using 5 Kb as fragment length cutoffs (Dick et al. 2009). Ribosomal RNA (rRNA) genes were called with EMIRGE individually.

Metagenome annotation
Protein coding sequences (CDS) were determined using Prodigal (Hyatt et al. 2010) with meta parameter. For each predicted CDS, functional information was collected by comparing against NCBI NR (Non-Redundant database), KEGG (Kyoto Encyclopedia of Genes and Genomes) (Kanehisa and Goto 2000), COG (Clusters of Orthologous Groups) (Tatusov et al. 2003) and Pfam (Protein Families database) (Finn et al. 2014) with E-values < 10 −5 and inspected manually.

Metatranscriptome analysis
After filtering the low-quality and rRNA reads, the rest of the high quality reads were mapped to the metagenomeassembled contigs with TopHat v2.1.1 (Kim et al., 2013), and FPKM (expected fragments per kilobase of transcript per million fragments mapped) values were used to estimate the expression level of each gene using Cufflinks 2.2.1 scripts (http:// cole-trapn ell-lab. github. io/ cuffl inks/). The transcriptome data used in this study has been uploaded to the public database NODE (http:// www. biosi no. org/ node) under the ID "OED111522", at this moment it is kept private.

Metagenome and metatranscriptome-based integrated approach to detect sRNA from Guaymas Basin hydrothermal sediment
Retrieving non-redundant, highly expressed, non-rRNA, cDNA reads mapped to intergenic regions of the metagenome Low-quality cDNA reads were removed with sickle (https:// github. com/ najos hi/ sickle) using the default parameters (Joshi and Fass 2011) and cleaned cDNA reads were further scanned for the ribosomal RNA (rRNA) in Silva database (Quast et al. 2013) using SortMeRNA (Kopylova et al. 2012). Reads identified as rRNA were removed from the cDNA datasets. Resulting cDNA reads were mapped to metagenomic assembled contigs from the same sample using Bowtie2 (Langmead and Salzberg 2012) using default parameters. Reads mapped to proteincoding genes were rejected, keeping those mapped to the intergenic regions of the metagenome. Duplicated cDNA reads were counted using CD-HIT (Fu et al. 2012) and were subsequently removed. cDNA reads having more than a hundred copies were selected and these non-redundant, highly expressed, non-ribosomal, non-protein coding, cDNA reads mapped to intergenic region of metagenome from the same sample were analyzed further.

Clustering of non-rRNA and non-protein coding, cDNA reads and secondary structure prediction
The selected non-ribosomal cDNA reads mapped to IGR (4,157,571 reads out of 145,587,140 reads) of the metagenome were clustered to form distinct groups based on their sequence identity. Reads were clustered using CD-HIT (Fu et al. 2012) and all matches that met a minimum cutoff of 90% sequence identity over 90% average sequence length were clustered into a separate group. In total, 550 clusters were obtained.
To find sRNAs in methane metabolizing archaea ANME belonging to phylum Euryarchaeota, a Perl script was used to generate only those clusters having at least one representative sequence from ANME, where 10 such clusters were obtained. Twenty random representative sequences from each cluster were picked and aligned using CLUSTALW (Larkin et al. 2007) and were subsequently subjected to RNA secondary structure prediction tool RNAz (Washietl et al. 2005). Positively predicted clusters by RNAz were analyzed and discussed further. The strategy used to identify sRNAs using metagenomic and metatranscriptomic data from Guaymas Basin has been demonstrated in Fig. 1.

Molecular simulation
Molecular dynamics simulations were carried out with ff03 force field (Duan et al. 2003). The sRNA model was solvated within octahedron water box of TIP3P (Jorgensen et al. 1983) with a water thickness above 10 Å. Sodium ions were added for the charge neutralization. The solvated systems were subjected to 10,000-step minimization with the steepest descent method for the first 1000 cycles and then conjugate gradient method. After that these systems were gradually heated from 0 to 300 K for 50 ps controlled by the Langevin constrain with a collision frequency of 2 ps-1. Next constant pressure and temperature (NPT) ensemble were applied for 50 ps equilibrium. Finally, 50 ns production simulations were carried out under NPT ensemble, without any restraint. The Particle Mesh Ewald method (Darden et al. 1993) was employed to calculate long-range electrostatic interactions, and structural snapshots were flushed every 500 steps. All MD simulations were performed using the parallel version of pmemd implemented in AMBER 12 (Case et al. 2012). Root-mean-square deviation (RMSD)-based clustering was performed by average linkage clustering algorithms. The representative structures were extracted to describe the linear and looping regions of sRNA. The clustering and the routine RMSD were calculated by cpptraj in AMBER 12.

Interaction of common linear region on sRNA structure with mRNA
To decipher the RNA-RNA interaction, IntaRNA tool (Wright et al. 2014) was used and interaction of linear regions (LR) on sRNA with the mcrA mRNA was investigated.

Comparative genomics-based approaches for identifying sRNAs in ANME
The nearly complete genome of important uncultured methane metabolizing archaea ANME-2a (99.02% completed with 1.31% contamination using checkM) and ANME-2d (99.67% completed with 4.41% contamination using checkM) belonging to phylum Euryarchaeota have assembled in various metagenomic studies (Haroon et al. 2013;Wang et al. 2014). Nearly complete genomes of ANME-2a and ANME-2d were utilized in two independent comparative genomics-based approaches to find small regulatory RNAs in the present study. As most of the sRNAs identified up to date are located in the intergenic region, two different approaches were utilized to create the homologous intergenic dataset for sRNA identification. The combination of RNAz and RNA Infernal was applied to datasets created by both the approaches. RNAz detects evolutionarily conserved and thermodynamically stable RNA secondary structures in multiple sequence alignments, which is not only accurate as compared to other available tools but also efficient as well. Knowledge-based approaches, with homologues of identified sRNAs for profiling, can be used as complementary to de novo ones. RNA Infernal, one of the knowledge-based sRNA identification tools, together with a de novo tool RNAz were utilized in this study. Methods for creating datasets are discussed below.

Method I
Intergenic regions from both ANME-2a and ANME-2d genomes and other 26 genomes belonging to family Methanosarcinaceae (as reference genomes) were extracted and were aligned using genome comparison method named progressive MAUVE (Darling et al. 2009), which identifies conserved genomic regions, rearrangements and inversions in conserved regions, and the exact sequence breakpoints of such rearrangements across multiple genomes. Thousands of datasets of aligned conserved intergenic regions were obtained. However, datasets that were found to contain regions of either ANME-2a or ANME-2d or both were selected and aligned using CLUSTALW. RNAz was applied on these aligned datasets to find if they represent a stable secondary structure. Datasets that were predicted as sRNA by RNAz were divided into smaller fragments and were subjected to motif searching tool RNA Infernal. Sequences giving an infernal score higher than 10 were considered as sRNA, criteria used by Xu et al. (Xu et al. 2009). However, only 4 sequences in ANME-2a and one in ANME-2d having an infernal score equal to or greater than '10', were obtained and considered as high confidence sRNAs. To extend our sRNA scanning across the two genomes an alternate method (discussed below) was applied.

Method II
As regulatory motifs were mainly found to be encoded in intergenic regions (IGR) of the genome, IGR dataset was generated by extracting sequences both from 5′ and 3′-untranslated regions (UTRs) of each gene in ANME-2a and ANME-2d, together with its orthologous IGR regions in all the other 26 genomes belonging to family Methanosarcinceae used in this study. The method is explained in details in a study by Nawaz et al. 2017(Nawaz et al. 2017). In line with the method I, a combination of RNAz and RNA infernal was applied to datasets generated using method II for sRNA identification.

Results and discussion sRNAs identified in deep-sea Guaymas Basin hydrothermal sediment
Use of metagenomic and metatranscriptomic data from the same sample provided an advantage for sRNA identification over identifying in any of this individual data. To our knowledge, this is the first metagenome and metatranscriptome-based integrated approach used for sRNA identification in deep-sea uncultured microbes. As illustrated in Fig. 1, the integrated approach for sRNA identification in metagenomic and metatranscriptomics for the Guaymas Basin sample resulted in 228 groups of sRNAs detected from metatranscriptomic reads (Supplementary Table S1). About 82% of them were found to have hits in the Rfam database (Nawrocki et al. 2015) while the rest of 18% were considered as having novel motifs which previously remained undiscovered. Many of the sequences belonging to different groups were predicted to have hits with microRNAs and CRISPERS. The 10 most abundant sRNAs detected in the metatranscriptome are listed in Table 1. Sequences from five groups including group number 100, 238, 240, 525 and 643 were found to have hits with well-known sRNA called carbon storage regulatory C (CsrC). CsrC is a post-transcriptional regulator of an important protein CsrA which have an important role in the regulation of central carbon flux, motility and biofilm formation (Argaman et al. 2001). Although, the length of sRNA reads here resembling with CsrC sRNA in Rfam is much shorter, i.e., 150 nt compared to 245 nt of the full length of sRNA. However, the predicted structure of this read was closely related to the structure of fulllength CsrC ( Figure S1). Results show that CsrC is highly expressed and therefore, likely playing an active role in Guaymas Basin microbes. Besides, Rfam hits of another sRNA involved in the regulation of iron-containing proteins 'FsrA' (Smaldone et al. 2012) was found with reads from two groups. Similarly, ydaO-yuaA which is known to act as cyclic-di-AMP sensing riboswitch (Nelson et al. 2013) was found to have hits with sequences belonging to four different groups in this study. In addition, cDNA reads from two groups were found to have hits with cyclic-di-GMP which is known to play its role in important mechanism such as biofilm formation (Valentini and Filloux 2016). Therefore, cyclic-di-AMP/GMP riboswitches are also active RNA regulatory structures in Guaymas Basin. Rfam hits of reads belonging to 6 groups were found as SmY, which are thought to have a role in mRNA splicing (Jones et al. 2009). Reads from 5 groups were found having hits with RsmY with a possible role in quorum sensing (Nakatsu et al. 2019) while two groups were found having hits with RsmZ which act as protein sponge (Duss et al. 2014). Similarly, group-II-D1D4-2 which is known to exist in high salt conditions with a possible role in mRNA splicing (Seetharaman et al. 2006), was found as hits in Rfam for reads belonging to four different groups. Group 10 was found to have a read with Rfam hit as isrJ which have a role in host cell invasion (Gong et al. 2011 Putative cis-sRNA of mcrA in ANME-1 The sRNA identification from Guaymas Basin was also specifically targeted to ANMEs where only one group of sRNAs was identified to have a representative sequence from ANME-1. The identified sRNAs groups were  (Silva et al. 2019) analyzed manually. Here, we report the identification of cDNA read that overlap the 5′-end of methyl coenzyme M reductase subunit A (mcrA) in ANME-1, intergenic region (1 nt) and methyl coenzyme M reductase subunit G (mcrG). It is not surprising that sRNA overlap with the coding regions of the genes, as they are known to evolve from mRNA, random transcriptions and tRNAs (Gottesman and Storz 2011). The stable secondary structure of this sequence determined using molecular dynamic tool Amber (Case et al. 2012) indicated that it is a stable and functional RNA. As cis-sRNAs are usually located at the 5′-end of their target gene (Weinberg et al. 2010), therefore, we suggest that this sRNA may regulate the mcrA gene in ANME-1. As transcription and translation occur simultaneously in bacteria, therefore, a high number of copies are not necessarily required for an sRNA to work at the level of transcription. High copy number of this cis-sRNA (131 copies) indicates that it might act as a post-transcriptional regulator of mcrA gene expression.
A functional RNA binds to other molecular structure through its linear region (Sharma et al. 2007;Peer and Margalit 2011;Richter and Backofen 2012), we select the common linear regions on all the possible structures of cis-sRNA to find its binding with mRNA. The linear region commonly exhibited by all the possible structures of sRNA was "CGU CCG CUUC" near the 3′-end of the sRNA (Position 129-138) ( Figure S2). To decipher the RNA-RNA interaction, IntaRNA tool (Wright et al. 2014) was used to investigate the interaction of linear regions (LR) on sRNA with the mcrA mRNA. Cis-sRNA was found to interact with its linear region (position 129-138) with the seed region of 7 bp to its mRNA. A region of 6-8 contiguous base pairs are generally involved in the sRNA-mRNA interaction (Gottesman and Storz 2011). Description of the interaction of linear common region on sRNA structure with mRNA is shown in Supplementary Figure S3, which is also demonstrated in Supplementary Table S2.
Putative cis-sRNA which overlaps with the 5′-end of the mcrA gene and transcribed in the opposite direction of its target mRNA, it might act as a riboswitch. Previous results have shown that methane metabolism archaea were abundant in the samples studied here (Wang et al. 2019), suggesting a high methanotrophic activity herein. We suspect that this sRNA might act as methane sensing riboswitch and under high concentration of methane, it binds to mRNA (mcrA), increasing the stability of mRNA which ultimately leads toward enhanced translation of mcrA protein and therefore, entailed in elevated methanotrophic activity and vice versa. However, more experimental evidences are needed to verify the anti-mcrA function of the sRNA in the future.

sRNA in ANMEs
To investigate if this type of cis-sRNA is widely present in methanogens and methanotrophs, we blasted cis-sRNA against methanogens and methanotrophic genome, and nucleotide database at NCBI. One hit with the same region on ANME-1 recovered previously from Eel River sediments, was found and its position was exactly the same as identified cis-sRNA, i.e., upstream of mcrA, with 93% sequence identity. Its associated mcrA gene in our sample has also best hit with mcrA gene recovered in Hallam's study (Hallam et al. 2003) with 93% sequence identity. However, this sRNA was not found in any of the other methane metabolizing archaea. The possible case might exist that other methanogens and methanotrophs could use sRNA for the same purpose but do not share sequence identity.
To investigate further whether similar sRNA structure exists in other methane metabolizing archaea, we extracted the same region in almost all the sequenced genomes of the methane metabolizers and predict sRNA using same methods as used in this study. However, sRNA was not found in any of the methanogen and methanotrophic group. As small RNAs are appeared to evolve rapidly (Gottesman and Storz 2011) and strain-specific sRNAs are also known to exist widely (Toffano-Nioche et al. 2012;Dugar et al. 2013); therefore, this cis-sRNA appeared as strain-specific sRNA regulating mcrA. ANMEs are not pure cultured yet in the laboratory. To test further if other sRNAs are present in ANMEs, the nearly complete genomes of ANME-2a and ANME-2d were screened as described in the Materials and Methods.

sRNAs predicted in ANMEs through Method I
Only 4 sequences in ANME-2a and one in ANME-2d having an infernal score equal to or greater than '10', were obtained and considered as high confidence sRNAs among all the datasets created.

sRNAs predicted in ANMEs through Method II
Initially, a total of 2051 sets of orthologous UTRs, of which 1175 from 5′-UTR and 876 from 3′-UTR, respectively, were extracted for sRNA prediction in ANME-2a. RNAz predicted 33 sets (1.6%, out of 2051) as sRNAs, with 25 and 8 from 5′ and 3′-UTRs, respectively. After searching against Rfam database using Infernal, only 4 (0.2%, out of total datasets) RNA motifs in ANME-2a were predicted with an Infernal score higher than 10, thus considered as reliable sRNA candidates. Similarly, in total 1436 sets of orthologous UTRs, of which 872 from 5′-UTR and 564 from 3′-UTR, respectively, were extracted for sRNA prediction in ANME-2d. RNAz predicted 19 sets (1.3%, out of 3673) as sRNAs, with 13 and 6 from 5′ and 3′-UTRs, respectively. After searching against the Rfam database using Infernal, none of the RNA motifs in ANME-2d was predicted with an Infernal score higher than 10.
Using both of the methods, we obtained 6 non-redundant sRNA in ANME-2a and only one in ANME-2d. Identified sRNAs and their flanking genes in ANME-2a and ANME-2d genomes are shown in Fig. 2. These sRNAs are flanked by the genes encoding archaeal flagellin N-terminal like domain, Aminopeptidase T, DisA (DNA integrity scanning protein), Epimerases, DUF772 (Domain of unknown function), 4Fe-4S dicluster domain, Superfamily II helicase, Orotate phosphoribosyltransferase, Cas3 (Crisper associated), Nucleic acid binding protein, and hypothetical proteins. Flagellin N-terminal like domain functions in recognition of flagellar export substrates (Chilcott and Hughes 1998;Hirano et al. 2003;Vegh et al. 2006;Weber-Sparenberg et al. 2006;Stafford et al. 2007). Aminopeptidases are a group of metalloprotease which are involved in the cleavage of free amino acids from peptides and are considered as essential enzyme because they perform various vital functions in the cell including hydrolysis of regulatory peptides, protein turnover, protein maturation, nitrogen nutrition, and modulation of gene expression (Taylor 1993;Matsui et al. 2006;Luckett et al. 2012). DisA functions in response to DNA-damage and stress homeostasis (Gandara and Alonso 2015). Epimerases are involved in the synthesis of complex polymers of carbohydrate which constitutes the prokaryotic cell walls and envelopes (Allard et al. 2001). 4Fe-4S dicluster domain is present in electron transferring proteins in prokaryotes where they play their role in important biological processes including nitrogen fixation, oxidative phosphorylation, and photosynthesis (Beinert 1990). Superfamily II helicase is involved in nucleic acid metabolism including DNA replication, DNA repair, transcription, translation, splicing and other related processes (Singleton et al. 2007;Lohman et al. 2008;Pyle 2008;Fairman-Williams et al. 2010). Orotate phosphoribosyltransferase is key enzyme involved in the synthesis of pyrimidine synthesis (Ozturk et al. 1995). Cas3 is the associated protein of CRISPR/Cas system that provides acquired immunity against viruses and foreign nucleic acids (Sinkunas et al. 2011). As the cis-encoded sRNAs are involved in regulating the gene expression of their nearby genes (Brantl 2007). The nearby genes of identified sRNAs in ANMEs suggest that these sRNAs might play their regulatory role in the important biological process including bacterial movement, nucleic acid and amino acid metabolism, stress tolerance, electron transportation, and bacterial defense system. For most of the predicted sRNA, their best Rfam hits are microRNAs that functions in RNA silencing and post-transcriptional regulation of gene expression. sRNAs are present in prokaryotes and have miRNAs in their eukaryotic counterparts. As we annotated the identified sRNAs against Rfam database which contain RNA families of both the prokaryotes and eukaryotes. Identified sRNAs having hits as Splicing RNA or mir-11 shows these sRNAs exhibit functional or sequence similarity with their eukaryotic counterparts and previously remain undetermined  Fig. 2 Position of sRNAs identified in ANME-2a and ANME-2d and their flanking genes in prokaryotes. Rfam hits of identified sRNAs in ANME-2a and ANME-2d are enlisted in Table 2.

Conclusions
Analysis of metagenomic and metatranscriptomic data lead to the identification of a massive number of sRNAs in Guaymas Basin (Gulf of California), demonstrating that microbes dwelling there exhibit diverse sRNAs which might have role in adaptation. We identified 228 groups comprising of hundreds of sRNAs. Rfam analysis demonstrated that most of them (82%) were predicted to have similarity with already discovered sRNAs. However, 18% of these identified sRNAs were more likely to be novel. The strategy applied to detect sRNAs here was designed to discover sRNAs with high expression, i.e., with more than a hundred copies of cDNA. High expression is required for sRNA only if it regulates the gene expression at the post-transcriptional level. Therefore, post-transcriptional regulators are the abundantly existing class of sRNA. The approach used here was particularly applied to focusing sRNAs in ANMEs. Interestingly, cisacting sRNA of mcrA gene, a key enzyme in methanogenesis, and AOM processes were identified in ANME-1 that have been recovered from Guaymas Basin hydrothermal sediment sample. Small RNAs have previously been discovered to regulate almost every global response in prokaryotes. However, this is the first ever discovery of sRNA related to methanogen or methanotrophy. Furthermore, two independent comparative genomics approaches were used to identify sRNAs in nearly complete genomes of ANME-2a and ANME-2d (Haroon et al. 2013;Wang et al. 2014) where six and only one sRNA was detected, respectively. Archaeal and bacterial sRNAs are almost structurally and functionally similar, however, the prediction of fewer sRNAs using comparative genomics-based approaches suggests that sRNAs in archaea are not evolutionary conserved as those in bacteria. Although functions of these sRNAs need to be investigated further using experimental approaches, this study provides the pioneering work on sRNA identification in ANME. Furthermore, our study investigated the active sRNAs in Guaymas Basin hydrothermal sediment sample and these sRNAs could be studied further to reveal their function therein.