Development of novel microsatellite markers for population differentiation and detection of natural selection in wild populations of butter catfish, Ompok bimaculatus (Bloch, 1794)

Butter catfish (Ompok bimaculatus) is a preferred species in South East Asia, with huge aquaculture potential. However, there is limited information about genetic stock composition due to insufficient markers. The goal of this study was to develop de novo microsatellite markers. For sequencing, genomic SMRT bell libraries (1.5 Kbp size) were prepared for O. bimaculatus. A total of 114 SSR containing sequences were used for primer designing. Polymorphic loci were validated by genotyping 83 individuals from four distant riverine populations, viz., Brahmaputra, Bichiya, Gomti and Kaveri. A total of 30 microsatellite loci were polymorphic, of which five were found to be associated with functional genes and eight (four positive and four negative) loci were found to be under selection pressure. A total of 115 alleles were detected in all loci and PIC ranged from 0.539 to 0.927 and pair-wise FST values from 0.1267 to 0.26002 (p < 0.001), with an overall FST value of 0.17047, indicating the presence of population sub-structure. Cross-species transferability of 29 loci (96.67%) was successful in congener species, Ompok pabda. The novel SSR markers developed in this study would facilitate stock characterization of natural populations, to be used in future selection breeding programs and planning conservation strategies in these species. Identified non-neutral markers will give insights into the effect of local adaptation on genetic differentiation in the natural population of this species.


Introduction
Catfishes are the most sought-after food fish and form important fisheries, with a global contribution of 123184.6 tonnes in 2020 [1]. Ompok bimaculatus, commonly known as Butter catfish or pabda has a wide distribution and is the preferred freshwater catfish in South and South East Asian countries, due to its taste, high nutritional value, soft bony structure and rich lipoprotein content. Moreover, it has adapted to cultural conditions, making it a favourable candidate species for aquaculture diversification. In India, this species is widely distributed in rivers, lakes and submontane regions, however, a decline in wild populations has been observed at an alarming rate [2]. This species has been declared an endangered species (nationally) in 1998 [3], however, was listed as Near Threatened in the IUCN Red List in 2009 [4]. Further, there is limited information about the genetic stock composition and genetic diversity, only using mitochondrial cyt b markers [2].
Microsatellites (SSRs) are markers of choice for the characterization of genetic resources due to high polymorphism, co-dominant inheritance and conformity to Mendelian segregation [5]. NGS technologies have been extensively used to discover microsatellites in various fish species [6,7]. In addition to neutral markers, it also facilitates the distinguishing of non-neutral markers or adaptive regions, which are influenced by natural selection, resulting in local adaptations [8].
The present study developed a panel of validated polymorphic microsatellite markers for the first time, for evaluating genetic diversity in wild populations of O. bimaculatus. In addition, markers under natural selection and genes associated with them were also identified. This information will be used for precise assessment of genetic variability in O. bimaculatus, which will assist in the management of stock for sustainable fisheries. Further, possibilities of cross-amplification of developed loci in another congeneric species, i.e., O. pabda will provide markers for population genetic studies.

Sample collection and DNA extraction
A total of 83 specimens of O. bimaculatus, collected from across four geographical locations i.e., Brahmaputra river (n = 18, Guwahati, Assam), Bichiya river (n = 19, Rewa, Madhya Pradesh), Gomti river (n = 37, Lucknow, Uttar Pradesh) and Kaveri river (n = 9, Chennai, Tamil Nadu) were used in the study (Supplementary Fig. 1). Majority of the specimens collected from the fisherman at the site of the collection were dead. Live fish collected from rivers, were euthanized with MS222 (Sigma Aldrich, USA). The muscle tissues were then collected and preserved in 95% ethanol in a 1:5 ratio (tissue sample: ethanol) at the sampling site and stored at 4 °C for long term use. The experiments were approved by Institute Animal Ethics Committee (IAEC), ICAR-NBFGR, vide No. G/IAEC/2020/022.

PCR amplification and genotyping
Primers were tested with traditional PCR amplifications in six individuals, from three different riverine locations. The PCR amplification was carried out in a 25 µl reaction volume containing approximately 50 ng of DNA, 1×PCR buffer, 25 mM dNTPs, 1.5 mM MgCl 2 , 5 pM primers (2.5 pM each forward and reverse primer) and 1.5 U Taq polymerase (Genei, Bengaluru, India). Amplification conditions were : denaturation at 95 °C for 5 min, followed by 30 cycles of 95 °C for 30 s, primer-specific annealing temperatures (T a ) for 30 s, 72 °C for 1.5 min, final extension at 72 °C for 10 min and hold at 4 °C. PCR amplicons were resolved in 10% native polyacrylamide gels (10 × 10.5 cm, Wipro GE Healthcare Pvt. Ltd., India) using 1×TBE buffer for about 5 h at 150 V. For visualization silver staining was done, using MspI digested PBR-322 as a standard size marker and molecular sizing of alleles was carried out on UVP Imaging System (Cambridge, UK).
A total of 83 O. bimaculatus individuals were genotyped from four different riverine locations using 30 polymorphic SSR loci, employing QIAxcel capillary electrophoresis system. QIAxcel DNA high-resolution kit (Qiagen, Hilden, Germany) was used for genotyping with QIAxcel Screen-Gel software v1.6.0. These amplified fragments were automatically scored for size polymorphisms in the specimens.
Loci under selection from genetic structure analysis were detected using Arlequin 3.5 [18] and BayeScan [19]. Arlequin uses coalescent simulations to get the p-values of locus-specific F-statistics conditioned on observed levels of heterozygosity whereas BayeScan implements a reversiblejump Markov chain Monte Carlo (MCMC) algorithm to estimate the posterior probability of balancing or purifying selection models.

Functional annotation of contigs regions containing microsatellites
All the polymorphic loci were searched for the associated functional gene using Basic Local Alignment Search Tool against the National Centre for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov/). The position of the microsatellites associated with genes, in coding and non-coding regions, was identified using Augustus [20]. Identification of Molecular pathways for annotated genes was carried out using KAAS (KEGG Automatic Annotation Server) [21] database (FDR ≤ 0.15).

Cross-amplification of microsatellite markers
For estimating the cross-transferability of polymorphic microsatellite loci in Ompok pabda, cross-amplification of all the polymorphic loci was tested in a set of 13 individuals of O. pabda from three riverine populations i.e., Bichiya (Rewa, Madhya Pradesh), Mahanadi (Dhodrokusum, Odisha) and Brahmaputra (Kalangpar, Assam). The individuals were genotyped using QIAxcel Capillary Electrophoresis System for estimating allele size range and number of alleles.

Characteristics of microsatellite markers
Out of a total 30 SSR loci, five were associated with functional genes, identified with their best-matched protein 21 alleles at locus OBL1181), with a mean number of alleles per locus 10.93. The majority of loci observed low to moderate observed heterozygosity (H o = 0.025 in OBL1874 to 0.675, OBL49202) and moderate to high expected heterozygosity (He = 0.569, OBL133599 to 0.937, OBL1181), with mean observed and expected heterozygosity value of 0.268 and 0.807, respectively. The F IS, F ST and F IT values ranged from 0.022 to 0.955, 0.032 to 0.3674 and 0.0316 to 0.9718. All the loci were found to have genotypic differential potential ( Table 1). A significant deviation was observed from Hardy-Weinberg equilibrium (HWE) in seventeen loci (Supplementary Table 7) and the frequency of null alleles observed in all populations ranged from 0 to 0.5599 (n = 18, Brahmaputra). However, in Gomti with n = 37, the null allele frequency ranged from 0.0024 to 0.4285 (Supplementary Table 8) and the highest frequency was lower than 0.5, all the loci were included in the further analysis [22].
The calculated PIC values for all 30 loci were higher than 0.5, where OBL1181 having the highest PIC value (0.927), followed by OBL786 (0.881), OBL2442 (0.875) and the lowest PIC value observed in OBL133599 (0.539), with an average of 0.778. The number of alleles (N a ) ranged between 5 and 21 (5 alleles at locus OBL92038 and OBL1913; and   Analysis through MICROCHECKER indicated homozygote excess at these loci, but no evidence for scoring error, due to stuttering and large allele dropout, was observed. Significant cumulative Linkage Disequilibrium (LD) after Bonferroni correction (p < 0.001) was observed only between locus pair OBL949 × OBL958 (p = 0.000094 and chi 2 = 27.99), out of a total of 435 pairs (Supplementary Table 9), when tested for significance levels adjusted using Bonferroni correction [23] through multiple testing.

Detection of Loci under selection from genome scans based on F ST
Out of 30 microsatellite loci, eight (four positive and four negatives, Fig. 2; Table 2, a) and three (two positives and one negative, Table 2, b) loci were found to be under selection pressure through Arlequin and BayeScan respectively. And out of total 5 type II SSR loci, two loci i.e., one positive (associated with Insulin Like Growth Factor 2 Receptor, IGF2R) and one negative (Epsin-2, EPN2) were found to be under selection pressure. Both these genes fall under Endocytosis (ko4144) pathway, under Transport and Catabolism. The sequence of IGF2R and EPN2 from the present study aligned with NCBI Accession no. VMHU01008964.1 and

Discussion
Next-generation sequencing has become the technology of choice for Genome-wide identification and development of SSR markers [7]. In O. bimaculatus, the majority of repeats were dinucleotide (73.4%), followed by tetranucleotide repeats (14.06%) and trinucleotide repeats (11.72%) with low frequency of pentanucleotide and hexanucleotide repeats. A similar pattern of distribution of the repeat motifs was observed in three neotropical catfishes by Restrepo-Escobar [24]. In the present study, PIC values of thirty novel polymorphic SSR loci indicated that all value of 0.341(Supplementary Table 10). AMOVA results indicating the variance among populations and within individuals are given in Table 3.  Table 11). of existing habitat [27]. This might be the reason for the relatively lower genetic diversity within the population.

Cross-species amplification in congener species, Ompok pabda
The number (N a ) of alleles per locus ranged from 5 (OBL92038) to 21 (OBL1913), with an average allele per locus of 10.93, indicating higher genetic variability than the reported average allele number per loci (9.1) for freshwater species [28]. High N a was also observed in other catfish with high genetic diversity [25]. It was interesting to find the highest number of alleles (21) to be located in the intron of KCNK2 gene. It may be possible that the polymorphism thirty loci are highly informative, indicating the usefulness for genetic stock structure analysis [25], with the highest PIC value of 0.927 (OBL1181). With these 30 polymorphic loci, all four riverine populations revealed relatively lower genetic diversity within the population. It was reported that fishes with a large population size that migrate during the breeding season have high genetic diversity [26]. In contrast, O. bimaculatus is considered a local migrant and travels short distances for feeding or locating suitable breeding grounds in new water bodies, to avoid the stress conditions  moderate genetic differentiation was as per the threshold values suggested by Wright [35]. Above all, the level of variation remained at the same level, when the type I and type II markers were analyzed separately. However, the markers under balancing (positive) selection revealed lower genetic variation within individuals and higher differentiation. The balancing selection is exerted by the adaptive forces, resulting in the maintenance of genetic variation and increased diversity at the linked sites [36]. In the present data set, at the balancing loci (regime), population differentiation is increased due to the maintenance of different sets of alleles in four populations analyzed and it was proposed as the difference in fitness of the genotypes in populations, under the different environmental conditions [37].
Thus, in addition to the fact that the species under study is non-migratory in nature, the selection pressure working on the loci under study may contribute to the existence of high genetic differentiation within the same river system. A similar pattern of high genetic differentiation was observed in other catfishes, i.e., Indian Catfish, Clarias magur (F ST : 0.01383 to 0.62069) [38], giant river-catfish Sperata seenghala (0.135-0.173) [25], however, selection pressure has not been examined in these cases.
Genetic diversity analysis showed significant differences between different riverine populations across India, i.e., Kaveri river (South India), Brahmaputra (Northeast India, Assam), Bichiya a tributary of Ganga river (Central India, Madhya Pradesh) and Gomti, a tributary of Ganga river (North India, Uttar Pradesh). In general, the population of the same riverine system is expected to have low genetic variability as a result of genetic decline, genetic drift, selection and inbreeding [39]. However, in this study high genetic variability in populations Bichiya and Gomti rivers (p < 0.01) could be considered from different perspectives: (1) Bichiya river originated from the plateau of central India and before commencing into the Ganga river system it has to go through the barrage and dams in the river resulting in isolated population; (2) Kaveri river originated in Western Ghats and did not have any connectivity with Ganga and Brahmaputra river; (3) Brahmaputra and Ganga river is fragmented by Farakka barrage; (4) Non-migratory nature of the species and (5) recent genetic bottlenecks in the wild populations.
The departure from Hardy-Weinberg equilibrium was observed in seventeen loci, which might be due to homozygote excess (as indicated by positive F IS values). However, the presence of a significant level of null alleles was only in two loci (OBL894 and OBL1957) and it may be due to the small number of samples studied in the present study, thus Hardy-Weinberg may be violated [29]. The significant value of genetic differentiation (overall F ST ) agreed with that of the results obtained through molecular variance. Positive F IS values in loci may be due to a reduction in heterozygotes, which may be the result of a reduction in population size by overexploitation, adverse ecological changes, localized isolated population and non-migratory nature [27].
Under the current study, analysis of five gene-associated SSR markers (Type II) showed a lower overall pairwise F ST value of 0.1574, as compared to that of neutral loci. It has been reported that in contrast to neutral markers (microsatellites in noncoding regions), gene-associated microsatellites marker might be more susceptible to selection pressure and therefore, showed low values of gene diversity which is concurrent to the current study [25].
The present study revealed the effects of divergent and balancing selection on eight loci (four each), which also included two gene-associated loci. It was interesting to find both the associated genes under selection pressure have been reported to respond to environmental changes. One marker is associated with the Insulin-Like Growth Factor 2 Receptor (IGF2R), which is a cell-surface receptor and known to have a binding site for Insulin-Like Growth Factor 2 (IGF2), where it endocytoses the growth factor and regulates the concentration of IGF2 within tissues in mammals [30]. It was reported that growth-related diseases have been found to be associated with the deregulation of expression of IGF2R in humans. However, in a teleost, the role of IGF2 is not clear in growth [31], and was found to be divergent selective pressures for adaptation in sticklebacks [32]. Another gene, under negative selection is Epsin-2, which encodes a protein, involved in clathrin-mediated endocytosis [33], which plays important role in nutrient uptake and cell-to-cell communications [34]. However, little information on its function in fish is available. Nevertheless, both the genes found to be under selection pressure belong to the same pathway: Endocytosis, important for regulating cell signaling and antigen presentation (keg.jp/entry/ko4144). It has also been reported to have an adaptive response in plasma membrane remodelling in the face of environmental changes [34].
With all 30 loci, the major genetic variation, present among individuals within populations with overall a considerable number of hydropower dams and barrages in the river, which can affect migration, egg and larvae drift of fish [42] and subsequent high genetic differentiation of the Bichiya population from Gomti and other populations. Alteration of the migratory flow, leads to a reduction or interruption of gene flow, consequently reducing the population size and making it more susceptible to the effects of genetic drift [43], which results in a genetic structure in fishes [44]. Furthermore, higher genetic diversity could be expected, as rivers belong to distant geographical locations and it is likely that the populations investigated could have evolved in isolation after fragmentation.
The cross-species amplification efficiency is inversely proportional to the phylogenetic distance between the two species [45]. Based on the mitogenome sequences analysis, O. pabda showed the highest similarity to O. bimaculatus in the genus with 86% identity [46]. Cross-amplification of 30 polymorphic loci of O. bimaculatus successfully amplified for 29 polymorphic loci in O. pabda and the allele size pattern of O. pabda was similar to that of O. bimaculatus. Cross-species amplification can be successfully used in genotyping and genetic stock characterization in O. pabda which will help in saving resources, effort and time. It has also been successful in other catfishes i.e., four related species of Silonia silondia [47] and endangered catfish Rita rita [48].

Conclusion
In the current study, thirty microsatellite markers have been developed for the first time in O. bimaculatus, out of which eight are non-neutral markers, which are influenced by natural selection. These markers would help in studying the selection pressure enforced by environmental conditions and the adaptation of the natural population to the local environment. Such loci will give insights into the effect of local adaptation on genetic differentiation in the natural population of this species.