Selection and implementation of SNP markers for parentage analysis in a Chinese crossbred cattle population

Combining direct sequencing method in chain deoxyribonucleic mass spectrometry (MALDI-TOF MS) genotyping method in individuals, a panel consisting of 50 highly informative single nucleotide polymorphisms (SNPs) for parentage analysis was developed in a crossbred Chinese cattle population. The average minor allele frequency (MAF) was 0.43 and the cumulative exclusion probability for single-parent and both-parent inference met 0.99797 and 0.999999, respectively. The maker-set was then used for parentage verification in a group of 81 trios with the likelihood-based parentage-assignment program of Cervus software. Compared with on-farm records, the results showed that this 50-SNP system could provide sufficient and reliable information for parentage testing with the parental mistakes for mother-offspring and sire-offspring being 8.6% and 18.5%, respectively. Knowledge of these results, we provided one low-cost and efficient method of SNP assays for running paternity testing in crossbred cattle population of Simmental and Holstein in China.


Introduction
It is common knowledge that the success of genetic parameter estimation and genetic evaluation in national or international cattle breeding systems is directly affected by the accuracy of pedigrees.
However, the wrong parentage appeared from time to time due to frequent use of artificial insemination (AI) and the lack of firmly maintaining reproduction records, and the inaccurate pedigree records [1]. Sanders et al. [2] had estimated the enormous influence of wrong and missing sire information on the reliability of estimated breeding values and genetic gain, especially those sires with small progeny size and traits with low heritability. An error rate by 10% in paternity determination would decline by 4.3% genetic gain per year [3]. Therefore, parentage testing, an essential tool for revising pedigree errors, has become an important element in both breeding practices and research.
Recent advanced deoxyribonucleic acid (DNA) markers in single nucleotide polymorphism (SNP) research have become more and more popular and large numbers of dairy cattle were routinely genotyped for dense SNP chip or whole-genome re-sequenced for genomic selection [4] and genomewide association studies [5], which also provided useful information for parentage assignment. Many studies have shown that significant progress in the utility of SNPs in cattle pedigree tests and some of In the past decades, crossbreeding has been extensively practiced for improving milk production, milk composition, fertility, as well as calving ease worldwide [10][11][12][13]. In China, the widespread crossbreeding between Simmental and Holstein is a universal way so as to better improve the comprehensive benefits, as well as decline the inbreeding coefficient [14,15]. However, the mistakes of parentage occurred more frequently in crossbreed populations, not the least because such reasons appear in pure one but the pedigree registration not pay enough attention. In addition, there is no study to build one parentage testing system for Chinese crossbred population, specifically. The aim of the present work was to select a set of SNPs with effective information and to estimate the potential utility of these markers for parentage testing in a Chinese crossbred population used two breeds, Simmental and Holstein, and provided one potential panel of SNPs for farmers and researchers running DNA analyzes for assignment of paternity purposes in that of the crossbred population in China.

Animals and DNA extraction
Seventy-five family trios, including calves and their registered parents, were used in the present study. Those were: 1) 12 bulls from 2 breeds, 4 Chinese Holstein bulls and 8 Chinese Simmental bulls; 2) 75 Chinese Holstein cows; 3) 81 progeny with 3 half-sibs and 3 full-sibs, out of which 38 were crossbred from Simmental, and the other 43 were pure Holstein. In total, 168 cattle were analyzed in the study.

Selection of SNP markers
SNP markers were pre-selected from previous reports [16,17] and from chip data with Illumina BovineHD BeadChip in Chinese Holstein cattle [18]. The selection criteria were: 1) the reported minor allele frequency (MAF) larger than 0.3, and 2) the genetic distance of markers in the same chromosome should be greater than 5cM. Then, SNPs were re-selected according to their heterozygosity that was predicted by the direct sequencing in polymerase chain reaction (PCR) product of pooling DNA according to the previous study [19]. Only SNPs with an allele peak height ratio over 1:3 were considered to be highly heterozygosis and were reserved preferentially. Finally, 59 promising highly polymorphic SNP markers were determined.

Genotype detection and Parentage analysis
The matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) method was performed for genotyping the 59 markers of 168 individuals. Primers for multiple assays were designed using Assay Design Software (version 3.1). For accuracy control, it arranged 10 pairs of duplicate samples. Basic genetic parameters, including MAF, observed heterozygosity (H O ), expected heterozygosity (H E ) and polymorphic information content (PIC), for each SNP and paternity index were calculated with Cervus 3.0 software [20]. And parentage analysis was carried out using the likelihood method.

Results And Discussion
As a cost and time-effective design, pooling DNA has been used in many kinds of research, such as, detecting SNP [21], estimating allele frequencies [22], QTL mapping [23], as well as genome association scan [24]. In the present study, direct sequencing of pooled DNA was chosen for evaluating SNP heterozygosity, and finally, 59 markers with high information were determined.
Then, the MALDI-TOF MS method was used for genotyping the 59 SNPs in all the 168 cattle. For assessing the accuracy of this platform 10 samples were repeated once and 936 data were gathered, out of which only one pair of a duplicate sample in one SNP data was not identical, showing a genotyping error rate of 0.002 in MALDI-TOF MS genotyping method, the similar SNP genotyping error was reported by Heaton et al. [25] in sheep. In addition, according to Cooper et al. [26], the animal call rate was also related to genotyping accuracy, the individual call rate ranged from 96-99% in our study. All the above results demonstrated that the genotyping data got in the present study was validated, and the method of MALDI-TOF MS could be a stable platform for genotyping those SNPs [27,28].
Among the 59 SNPs, 9 were found to possess a call rate lower than 85% or being departed from the Hardy-Weinberg equilibrium and were deleted in further analysis. Consequently, 8400 genotypes generated from 50 SNP markers were finally determined (Table S1). These 50 SNPs distributed on 27 autosomes (Fig. 1a), with an average call rate of 97.01% (8160/8400). The MAF ranged from 0.27 to 0.50, with an average value of 0.43 in the total population (Fig. 1b). The percentage of SNPs with a MAF between 0.45 and 0.50 was 46%, and 74% of SNPs had a MAF ranging from 0.40 to 0.45. Ninetysix percentages of SNPs had a MAF higher than 0.3, suggesting that these 50 markers were highly informative and could be used as good genetic markers for parentage analysis in our crossbred population [25].
Furthermore, we used the 50 markers to construct parentage testing, and the cumulative probability However, few SNPs used in the present study were same as the above studies. The number one reason why those SNPs for paternity testing significantly vary between different populations is because core markers closely linked with heterozygosity and call rate of SNPs in test population, so it is necessary to develop an SNP panel with sufficient power to identify individuals and their parents in certain populations [7]. Therefore, our work in the current study made up for the shortcomings of nonspecific paternity identification markers for the crossbred population of Simmental and Holstein in China.
The sub-set of 50 SNP markers was further used for verification of parentage testing in the population of 168 individuals. The results of parentage inference analyses for the 81 calves were summarized in Table 1. Of the 81 offspring-mother relationships, 79 had the confidence > 85%, including 61 with confidence ≥ 95%. For paternity inference, 66 calves had the confidence ≥ 95%, with a ratio of 81.5%, and 76 individuals had a confidence over 85%. In detail, seven calves with a confidence level lower than 85%, and whose inferred parents were not the same as those recorded under the confidence beyond 85%, were considered as error paternity record in this study. The reason for this was that their real mother or father was not sampled in this experiment, and there was a close relationship between two Simmental bulls i.e. one sire was the uncle of the other. In total there were 17 calves whose inferred parents were incompatible with the putative ones, among which 4 had both sire and dam pedigree errors. Parental information of these 17 individuals was reconfirmed according to the on-farm records. The reasons for paternity mistakes were analyzed using both birth and calving data, as well as insemination records. Six calves had incorrect ear tags due to which their parent tracing was puzzled. In this farm, ear tagging of freshly born calves is practiced once a week not immediately after birth which caused paternity error in some calves. Three individuals were misidentified due to incorrect recording the semen or the cow label by AI technicians, and incorrect paternity recording for the other 8 calves were due to multiple inseminations using different sires. In Israel, multiple inseminations could explain at most 20% of the rejected paternity [1], however, in the present population, more than half of the paternity mistakes were due to this reason. Thence, the comparison results confirmed that pedigree inferred from the developed SNP-panel was correct, which absolutely showed the effective and powerful identification of this SNP paternity testing system. At the same time, combining the on-farm and genotypic data for paternity analysis is an effective option [30].
In the current study, the paternity mistakes of sire-offspring and mother-offspring were 18.5% (15/81) and 8.6% (7/81), separately. The paternity mistakes of sire-offspring was much higher than that of mother-offspring, which were in line with the reports by Chu et al.
[31] and Guo et al. [18] in the Chinese Holstein, and the reason of the multiple inseminations using different bulls may account for those results. Many studies had also reported the paternity error in other countries by using. In the United Kingdom (UK), Visscher et al. [32] reported an overall paternity error rate of 10% in the dairy population. Similarly, 7% paternity error was found in the Angeln dairy cattle population of German [2]. In Kenya, the sire misidentification rate was even over 50% in Boran cattle [33], which was much higher than the results of the present study. Therefore, parentage testing as an essential tool for correcting pedigrees are extremely important for both breeding and practice, efforts should be made to improve the accuracy of pedigree records in the cattle industry. Generally, more and more findings have reported the SNP panels for parentage verification in different cattle population, for instance, the Red Sindhi cattle in Brazil [34], the Brahman cattle in Costa Rica [35], the Angus Beef Cattle in United States of America [8], as well as the Holstein population in Mexico [36]. Our results also showed that the set of 50 SNP markers could be a practical tool for correcting pedigrees in the crossbred population of Simmental and Holstein in China.  The chromosome distribution (a) and MAF distribution (b) of the 50 SNP markers. The chromosome distribution (a) and MAF distribution (b) of the 50 SNP markers.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.