Characterization of 58 STRs and 94 SNPs with the ForenSeq™ DNA signature prep kit in Mexican-Mestizos from the Monterrey city (Northeast, Mexico)

STR allele frequency databases from populations are necessary to take full advantage of the increased power of discrimination offered by massively parallel sequencing (MPS) platforms. For this reason, we sequenced 58 STRs (aSTRs, X-STRs, and Y-STRs) and 94 identity informative SNPs (iiSNPs) on 105 Mestizo (admixed) individuals from Monterrey City (Northeast, Mexico), with the Primer Set-A of the ForenSeq™ DNA Signature Prep Kit. Most of the STR markers were in Hardy Weinberg equilibrium, with a few exceptions. We found 346 different length-based alleles for these 58 STRs; nevertheless, they became 528 alleles when the sequence was assessed. The combined power of discrimination from autosomal STRs (aSTRs) was –virtually– 100% in both length and sequence-based alleles, while the power of exclusion was 99.9999999976065 and 99.9999999999494%, respectively. Haplotypes based on X-STRs and Y-STRs showed 100% of discriminatory capacity. These results provide –for the first time– forensic genomic population data from Mexico necessary for interpretation in kinship and criminal analyses.


Introduction
Analysis of PCR products by massively parallel sequencing (MPS) overcomes many of the limitations attained with capillary electrophoresis (CE) analysis. MPS is getting José Alonso Aguilar-Velázquez and Miguel Ángel Duran-Salazar have contributed equally to this work.
Regarding the inclusion of MPS in forensic genetics, the DNA Commission of the International Society of Forensic Genetics (ISFG) promotes obtaining allele frequency population databases to take full advantage of the increased power of discrimination offered by MPS generated data [13]. Unfortunately, this task has been scarcely done in Latin America, where -to our best knowledge-only one Peruvian population has been described [6]. In this paper, we report allele and haplotype frequencies and forensic parameters for the markers included in the ForenSeq™ DNA Signature Prep DPS-A in the Mexican Mestizo (admixed) population from Monterrey City (Northeast, Mexico), which is the second main economic, cultural and political metropolis of this country.

Subjects and methods
Population sample and DNA extraction method DNA was obtained from peripheral blood using the Prep-Filer Express BTA™ Forensic DNA Extraction Kit in 105 unrelated (43 males and 62 females) residents of the Monterrey City in the Nuevo Leon state (Northeast, Mexico). For this purpose, the AutoMate Express DNA extraction system was used according to the supplier's instructions (Applied Biosystems). Next, DNA was quantified with the Quantifiler® Trio DNA quantification kit in a 7500 Applied Biosystems Real-Time PCR system. Volunteers signed an informed consent form before their inclusion in the study, according to the Helsinki Declaration Ethical Guidelines. This work was approved by the Ethical Research Committee at the Institute of Criminalistics and Forensic Services of the Attorney General of Nuevo Leon (Project number assigned: IC-017-2019). The anonymity of the participants will be preserved at all times.

Massive parallel sequencing (MPS) method
Libraries were generated using the DPS-A of the ForenSeq™ DNA Signature Prep Kit (Verogen®, San Diego, CA, USA). Library preparation involved PCR to amplify the DNA targets (STRs and iiSNPs) and incorporate dual indexed adaptors [12]. The libraries were normalized and then pooled. A total of 12-32 samples were pooled on each run. The library pool was diluted, denatured, and then added to the MiSeq FGx™ reagent Standard and Micro Kits for cluster generation on the flow cell according to manufacturer recommendations. Sequencing was conducted following the procedures outlined in the MiSeq FGx™ Instrument Reference Guide [14]. After each sequencing run was completed, a post-run wash was performed. We carried out five sequencing runs, two of them with 12 samples on Microflow cell and three runs with 32 samples on Standard Flow cell, including positive and negative amplification controls in each run. Sequencing results were analyzed with the Universal Analysis Software (UAS) provided by the manufacturer and using its default parameters for variant calling and then retrieved and downloaded in an Excel sheet.

Data analysis
STR allele sequences were analyzed through the R script (IFator for autosomal STRs, YIFator for Y-STRs, and XIFator for X-STRs) previously reported by Casals et al. [3] and available on GitHub (https:// github. com/ fcala fell/). Then, sequences reported herein were compared against a set of reference sequences built with previous sequencing results [3] and updated with our results. The R script provides the following results: (i) repeat sequence-based (RSB) and length-based (LB) allelic frequencies (aSTRs and Y-STRs); (ii) haplotype frequencies of both RSB and LB for Y-STRs; (iii) number of different haplotypes for X-and Y-STRs and; (iv) statistics of forensic-interest such as the power of discrimination and chance of exclusion for autosomal STRs. For practical purposes, we implemented the notation for RSB alleles proposed by Casals et al. [3], always taking into account the suggestions of the DNA commission of the ISFG [13]. The notation consists of the repeat number as provided by UAS followed by a lowercase letter, which is different for each different sequence (i.e. allele 16a, 16b, or 16c). For a more detailed explanation, please see the Material and methods section of [3]. The full list of RSB variants and their notation can be found as Supplementary Material for aSTRs, X-STRs, and Y-STRs, respectively (SM1-SM3). Moreover, the following analyses were computed using the GenALEx complement of Excel [15] and the GDA version 1.1 software [16]: (i) Fisher exact tests to evaluate the Hardy-Weinberg equilibrium (HWE) per locus, and linkage equilibrium (LE) between pair of loci of aSTRs, iiSNPs, and X-STRs (only females included, n = 63); (ii) allelic frequencies and statistics of forensic interest of the X-STRs and iiSNPs, and; (iii) expected and observed heterozygosity of the aSTRs and iiSNPs. Although we applied the Bonferroni correction to evaluate multiple Fisher exact tests p-values, we also assessed departures from HWE and pairwise LE for the whole dataset using the truncated product method of Zaykin et al. [17]. It is important to note that these tests are not independent; thus, the truncated product method can only be used as a guide. In addition, due to the large numbers of HWE and LE tests, the observed and expected p-values were represented as p-p plots with the software IBM SPSS, as described to validate population databases for forensic purposes [18].

Results
Fifty-eight STRs and 94 iiSNPs were sequenced for 105 samples from Monterrey City, Mexico, which is in line with the guidelines for the publication of genetic population data generated by MPS published by Gusmão et al. [19], who recommended at least 50 full genotypes. For the three runs, quality metrics (Cluster density, Clusters passing filter, Phasing, and Pre-phasing) were within the boundaries defined by the manufacturer. All negative controls were blank and all positive controls gave full and expected profiles in all runs (2800 M supplied in ForenSeq™ DNA Signature Prep Kit).

Allele frequencies and forensic parameters for aSTRs
The forensic parameters and allelic frequencies for LB and RSB variation for the 27 autosomal STRs are reported as supplementary material (SM4-SM5). Furthermore, 105 different RSB genotypes were observed for the 27 aSTRs (SM6). A total of 252 different alleles were found when the allele calls were based only on the number of repeats. However, the number of alleles become 367 when the sequence variation was analyzed (46% greater), involving 19 autosomal STRs (Fig. 1a). The largest increase in allele diversity was observed in D12S391 (13 vs. 34), D21S11 (12 vs. 28), and D2S1338 (10 vs. 26). Conversely, no increase in the allele number was observed in the following aSTRs: CSF1PO, D10S1248, D16S539, D20S482, D20S482, D22S1045, PentaD, TH01, and TPOX. As expected, the observed heterozygosity of aSTRs with RSB variation increases importantly (SM4-SM5). The combined power of discrimination was -virtually-100% for both LB and RSB variation, whereas the combined power of exclusion for RSB and LB were 99.9999999999494 and 99.9999999976065%, respectively. As expected, the combined power of exclusion was slightly higher in RSB than in LB alleles, which is the result of the increase in allele number for some STRs when the sequence variation is analyzed (SM4-SM5).

Allele frequencies and forensic parameters for Y-STRs
For 24 Y-STRs, 39 complete haplotypes were sequenced, of which 39 different haplotypes were observed in both LB and RSB variation (SM10-SM11, respectively), both rendering a discriminatory capacity of 100%. Therefore, RSB variation did not imply an increase in the number of haplotypes because LB variation was sufficient to differentiate all haplotypes. When the LB diversity was assessed, we obtained 67 different alleles (SM12). However, 105 different alleles were observed when the allelic RSB diversity was analyzed (SM13). RSB variation was detected in the following nine Y-STRs: DYF387S, DYS389II, DYS390, DYS448, DYS481, DYS505, DYS549, DYS570, and DYS635 (Fig. 1c).

Allele frequencies and forensic parameters for iiSNPs
Forensic parameters, allele frequencies, and HWE analysis of 94 identity-informative SNPs were reported as supplementary material (SM14). The average expected heterozygosity was 0.4427, close to the maximum possible value (0.5), which is similar to previous descriptions in European populations [3,4]. For the 94 iiSNPs, the combined power of discrimination was 99.99999999999999999999626 (∼1), while the power of exclusion was slightly lesser 99.99577%.
In addition, to avoid too conservative Bonferroni corrections to the huge number of HWE and LE tests, the truncated product method was achieved to the complete dataset of 27 aSTRs, 7 X-STRs, and 94 iiSNPs, respectively (Figs. 2, 3 and 4) [17]. The HWE was demonstrated in aSTRs and iiSNPs with low -but not significantp-values, whereas X-STRs were out of Hardy-Weinberg expectations (Figs. 2 and 4). Conversely, the LE agreement was demonstrated for both STR datasets (Fig. 3), whereas a significant LD was confirmed in the iiSNP dataset (Fig. 4).

Discussion
To our best knowledge, this is the first study where forensic MPS-based data was reported in a Mexican population, and is the third report from Latin America, besides the Peruvian and El Salvador population reports [6,20]. However, the Peruvian study describes forensic parameters of aSTRs and iiSNPs with the same forensic MPS platform, but they did not report X-STR and Y-STR population data [6]. Thus, given the scarce number of available genomic databases, our MPS-based data will be useful for forensic interpretation purposes in Latin America.
The number of different alleles found in this study (528 different STR alleles) was higher than those reported in Spanish [3], French [4], Asian [5], and Peruvian [6] populations. Also, our RSB results improve importantly the combined PD and CE compared with those previously obtained in Monterrey City but based on 23 aSTRs genotyped by CE [21]. Similarly, our sequencing results for 24 Y-STRs also improved the Y-linked genetic informativity of Monterrey City regarding the previous report with 23 Y-STRs based on CE, where a discriminatory capacity of 93.75% was described [22]. Although a larger population sample size could exhibit more allele diversity, producing more accurate estimates, we were able to observe as far as 367 different aSTR alleles, which is similar or superior to previous studies with the DPS-A of the ForenSeq™ DNA Signature Prep [3][4][5].
We must clarify that this study was not designed to carry out a formal evaluation of the HWE and LE agreement in the studied Mexican population, which is especially true for genomic systems including highly polymorphic markers analyzed by MPS, such as the STRs. Although we firstly used the Bonferroni correction for multiple HWE and LE testing, as recently reported in El Salvador for MPS-data [20], we implemented the truncated product method and p-p plots previously applied to forensic population studies to evaluate the hypothesis that none of the loci departed from HWE and LE [23][24][25] (Figs. 2, 3 and 4). For instance, additional tests allowed concluding that the significant HWE p-values observed in two X-STRs are unexpected in the whole dataset (SM8-SM9), probably given their high proportion among the limited number of X-STRs analyzed (2/7 = 28.57%), which limits to use the square of the binomial formulas to estimate X-STR genotype frequencies. However, the moderate significance of these HWE p-values (range: 0.028-0.039), and the small population sample studied herein, did not allow to suggest omitting DXS7423 and HPRTB from forensic casework. In addition, the easiest and formal estimation of LR values in kinship analysis with the software FamlinkX will require further analyses of these seven X-STRs [26], such as RSB haplotype frequencies from a large male population sample.
Four iiSNPs that showed HWE disagreement in the Fisher exact tests are related to the clear LD detected in the whole iiSNP dataset (SM14; Fig. 4). Thus, the omission of rs6444724, rs338882, rs6955448, and rs1493232 is recommended. Similarly, the following iiSNPs that showed consistent LD also should be removed: rs159606, rs2107612, and rs7041158. Conversely, the HWE and LE agreement demonstrated in aSTRs at LB and RSB variation (Figs. 2 and 3) support the square of the binomial formulas and the product rule to estimate DNA profile frequencies based on the ForenSe-qTM DNA Signature Prep kit in this Mexican population, which is in agreement with previous reports of aSTRs in Monterrey City [18]. However, regardless of the results of independence testing, the admixture and substructure described among Mexican populations is well-known [27], especially considering that Monterrey City is the second-most-important metropolis in Mexico located in the Nuevo Leon state, at the borderline with the USA. Consequently, taken into account the NRC II recommendations [28], multi-locus profile probabilities for criminal investigations should be computed by the Balding and Nichols method [29] using conservative values of the inbreeding coefficient (θ or F ST ). Based on the interpopulation differentiation previously estimated among seven Mexican populations from 20 aSTRs (F ST -value = 0.002; p = 0.0000) [27], an F ST -value of 0.01 will be conservative enough for forensic casework in this country.
This study follows the ISFG recommendation regarding the generation of STR allele population databases [13], which are required for biostatistical interpretation in kinship analysis and criminal cases where the ForenSeq™ DNA Signature Prep kit is used. For instance, in a recent complex paternity case reported in Mexico, the suitability of a forensic genomic platform was invoked [30]. However, the largest social impact of genomic platforms is expected in criminal investigation, given the 95,121 missing persons and 52,004 unidentified bodies reported in Mexico, until 2021 [31]. Consequently, the genomic population database reported herein will be helpful for this task in Mexico, but also Latin American populations using the ForenSeq™ DNA Signature Prep kit without a proper population database.

Conclusions
In brief, we report allele frequencies and forensic statistical parameters of STRs and iiSNPs based on the analysis of the ForenSeq™ DNA Signature Prep kit in the Monterrey City population (Northeast, Mexico). Our sequencing results describe the internal repeat variation of different STRs, such as aSTRs, X-STRs, and Y-STRs, besides to iiSNPs. The reported genomic population database provides valuable information for statistical interpretation in forensic casework in Mexico and probably some other Latin American populations.