This project was conducted according to the relevant national and international guidelines and did not involve any endangered or protected fish species. All fish specimens were either collected from the local fishermen, caught using non-invasive fishing gear by the authors, or bought from the local market. This study was carried out following the recommendations and approval by the Universiti Sains Malaysia Animal Ethics Committee.
A total of 441 specimens were sampled between December 2018 to October 2019 at multiple locations along the Merbok Estuary and its vicinity (Figure 1). Specimens were collected either from local fishermen (who use the barrier-net method locally called ‘pompang’), direct sampling by dip-net or bought from the major fish landing site (Kuala Muda Whispering Market). All specimens were caught within Merbok River and its adjacent waters. Samples collected from the fish landing site were retrieved from fishing vessels that operate within Zone A (from the shoreline up to 5 nautical miles) and Zone B (from 5 to 12 nautical miles)57. Information on the sampling localities (geographical coordinates) is shown in Table S1. Other collection data – dates, taxonomy and details of voucher specimens can be retrieved from the online project datasheet implemented in BOLD with project code – DBMR.
Sample processing and morphological identification
A fin clip from each fresh specimen was taken and stored in 90% ethanol. Voucher specimens were fixed in 10% formalin for at least one week and then transferred into 70% ethanol for long term storage. All specimens were catalogued and deposited at the Museum of Biodiversity, Universiti Sains Malaysia.
Morphology-based species identifications and nomenclature follow15. We were unable to unequivocally assigned few specimens to a valid described species using available keys. In these cases, we used either “sp.”, “cf.”, or “aff.”.
Genomic DNA was extracted using DNeasy Blood & Tissue kit (Qiagen, Germany) following the given protocol of animal tissue DNA extraction. The purity and concentration of the isolated DNA were measured using a microvolume UV spectrophotometer (Quawell Q300, Quawell, CA) and stored at -20°C until further use. An approximately 650 bp fragment of the mitochondrial COI gene region was amplified using the combinations of the following primers previously designed by22:
Each sample was amplified in a final volume of 25 µL, containing 5.5 µL of 5x MyTaq™ Reaction Buffer Red (Bioline GmbH, Germany), 0.5 µL of each primer (100 ng/µL), 0.25 µL 5U Taq polymerase (iNtRON Biotechnology Inc., Korea), 2.5 µL of genomic DNA (50 ng/µL) and adequate nuclease-free water to complete the final reaction volume. Each amplification set was performed with the inclusion of a negative control (no template DNA) with thermal cycling conditions as follows: initial denaturation at 94°C for 4 minutes; followed by 35 cycles of denaturation at 94°C for 30 seconds, annealing at 48°C for 50 seconds, and extension at 72°C for 1 minute; then a final extension at 72°C for 10 minutes. The PCR products were then fractioned by 2% gel electrophoresis to check for successful amplification. All positive amplifications were then sent for purification and sequencing to Apical Scientific Sdn. Bhd. (Selangor, Malaysia) operating the ABI PRISM 3730XL automated sequencer and the ABI PRISM BigDye terminator cycle sequencing kit v3.1 (Applied Biosystems, Foster City, CA). Bidirectional sequencing was employed to decrease the probability of sequencing errors.
Each generated chromatogram was manually screened prior to DNA alignment in MEGA X58. The sequences were proofread and independently aligned and then inspected for deletions, insertions and stop codons using the same software. All sequences have been uploaded in BOLD54 and deposited in GenBank55 (Accession nos. MW498499 - MW498843).
A total of 350 COI sequences were determined in this study. To assess the taxon discrimination between all specimens, pairwise genetic distances were calculated within and between species, genera, and families based on the Kimura 2-parameter (K2P) distance model59 using the analytical tools available in the BOLD system platform. To depict a graphical representation of the genetic relationships of the sequences, Bayesian Inference (BI) and Maximum Likelihood (ML) analyses were run in BEAST 260 and raxmlGUI 2.061 program, respectively. The GTR+I+G substitution model was determined as the best one in PartitionFinder 262, as implemented in the CIPRES portal63. The BI tree was constructed with the GTR+I+G substitution model, empirical base frequencies with four gamma categories, employing a relaxed lognormal clock and the birth-death model. Two Markov Chain Monte Carlo (MCMC) chains of 40 million were run independently, sampled every 1000 generations and the first 20% were discarded as burn-in. Both run performances were then assessed for convergence (ESS > 200) using Tracer 1.7.1 and combined using LogCombiner 2.4.8 before the final tree was constructed using TreeAnnotator 2.4.7, within the BEAST 2 package60. The ML tree was also built based on the GTR+I+G model with 1000 nonparametric bootstrap replicates. Both constructed trees were then viewed and edited in FigTree 1.4.464.
Three different sequence-based methods were used to delimit the Molecular Operational Taxonomic Units (MOTUs) from the analysed sequences - (1) Refined Single Linkage (RESL), (2) Automatic Barcode Gap Discovery (ABGD), and (3) Generalized Mixed Yule Coalescent (GMYC). The first analysis was done within the BOLD platform using the RESL algorithm65 to assign sequences to a dedicated Barcode Index Numbers (BIN). Next, the ABGD39 analysis was run at the webserver (https://bioinfo.mnhn.fr/abi/public/abgd/abgdweb.html) to census divergence within the analysed dataset for species delimitation. The ABGD analysis was run with the following settings: relative gap width X=1.0, intraspecific divergence (P) values range from 0.001 to 0.0059 for all the distance metrics, while all other parameter values were kept as default. Finally, the GMYC method66 was employed with the fully resolved, BI ultrametric COI tree (see above for the reconstruction method). A single-threshold GMYC analysis was run in RStudio67 with the ‘splits’ package.
All the COI sequences analysed in this study are available in the BOLD system under the DBMR project. The sequences can also be retrieved in GenBank (Accession nos. MW498499 - MW498843).