A lethal disease of the European bison - posthitis is conditionally determined by its genomics.

doi:10.21203/rs.3.rs-3961236/v1

Download PDF

Article

A lethal disease of the European bison - posthitis is conditionally determined by its genomics.

https://doi.org/10.21203/rs.3.rs-3961236/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Posthitis is an incurable lethal disease of males in the European bison (Bison bonasus), regarded as one of the major threats of the survival of the iconic species. Multiple attempts have been undertaken in the last 30 years to identify a source of infection and a primary pathogen. A genetic background for the disease was suggested after tools developed for cattle (Bos taurus) permitted the discovery of genomic regions possibly associated with the occurrence of posthitis.

In this study, we applied deep coverage targeted sequencing to 74 regions on 10 of the bison’s chromosomes (1, 9, 12, 13, 15, 23, 25, 26, 29, and X) in a search for species-specific single nucleotide polymorphism (SNP) markers that could help to explain the mechanism of the disease and be used to test for posthitis susceptibility. The association results were ranked based on p-values lower than 0.005 and odds ratios (OR) higher than 1. We obtained 30 SNP markers within these requirements, all located on chromosome 25.

Major difficulties are conditional nature of posthitis and ‘false negative’ sampling. Thus we recommend complex genomic and environmental factors association assay that could eventually explain the puzzling etiology of the disease and support the species conservation.

Biological sciences/Computational biology and bioinformatics

Earth and environmental sciences/Ecology

Target Enrichment Sequencing

Genome-wide association studies

Genome Analysis Tool Kit (GATK)

species survival

European bison conservation

posthitis

Posthitis (necrotic inflammation of the prepuce) is a severe disease of the European bison (Bison bonasus), which affects approx. 6% of the males every year in the Białowieża Forest, which extends between Belarus and Poland [1]. The disease is regarded as incurable, and the affected animals are usually culled or otherwise, the disease leads to systemic infection followed by penis auto amputation and death. The disease has been observed in European bison males in the Belarussian part of the Białowieża Forest since 1962 [2, 3, 4] and on the Polish side of the Białowieża Forest since 1980 [5]. The disease is regarded as one of the major threats to the survival of the species and one of the main concerns of its conservation. The initial symptom of posthitis is swelling of the genitals with purulent exudate, then necrotic changes occur that lead to auto-amputation of the penis and subsequently death due to overall infection. As the disease can occur at any age, both adult males and young calves can be affected [1].

During over 30 years of posthitis presence in the Polish population of the European bison in the Białowieża Forest, there have been numerous attempts to identify a source of infection and a primary pathogen. Neither bacteriological [6], biochemical [7], immunochemical [8] nor immunogenetical [9] studies have revealed the etiology of the disease. Novel hypotheses were proposed, suggesting that the primary cause of infection may be either a result of a bite from an infected tick[10] or low genetic variability of the species [11, 12]. Nevertheless, pedigree analyses of healthy and affected bison males showed no significant correlation between the individual's level of inbreeding and the incidence of posthitis [13].

Certain chromosomes and regions seem to show an association with the occurrence of posthitis, as demonstrated with the application of a bovine single nucleotide polymorphism (SNP) panel (Illumina Bovine HD 777 K microarray) [2]. However, the cross-species application of this panel might have omitted important genetic variation specific to our study species, and we, therefore, decided to apply a target enrichment approach to reveal as much species-specific information as possible. The target enrichment technique is most widely used in next-generation sequencing (NGS) to identify markers or mutations in specific regions and their potential impacts[14, 15, 16, 17].Due to improved depth coverage and lower cost compared to whole genome sequencing, the sequencing of a segment of the genome can be cost-effective for detecting modest levels of variation [18]. This strategy has been widely reported in plants [19, 20, 21, 22].

Genome-wide association studies (GWAS) are a vital tool for investigating the relationship between phenotypic traits and genomic variants, leading to the identification of SNPs [24], which are the variants most commonly used for GWAS [24, 25, 26]. In addition, GWAS has been used to identify disease-associated variants in humans [27, 28], crops [25] and plant model organisms [29, 30, 31].

Our goal was to sequence the target regions of the European bison showing possible association with posthitis [2, 32] and identify any species-specific, post-speciation acquired SNP markers associated with posthitis occurrence that might help explain the etiology of this enigmatic disease.

Sequence and Variant Calling

The total number of reads was 1,396,541,116 ranging from 1,777,440 to 6,999,509 with an average and median of 4,476,093 and 4,448,277 respectively.The first (25%) and third quartiles (75%) value was 4, 079,768 and 4,948,763 respectively. The first and third quartiles were used to compute the interquartile range: 868,994 (Supplementary Table S1). The first quartile's lower and upper bounds were 2,776,276 and 5,383,260, respectively, whereas the third quartile's lower and upper bounds were 3,645,271 and 6,252,255.

SNP Identification

From the GATK (Genome Analysis Tool Kit) pipeline, the total number of variants obtained in the GVCF file was 1,620,812. The GVCF file was converted into a VCF file to carry out subsequent steps, and 4,315,223 variants were discovered in the raw VCF file. Furthermore, SNPs and Indel were retrieved from the raw VCF file, and we found and extracted 68,098 SNPs. Hard filtering was applied, and 42,539 SNP variants were flagged as "PASS" as a result.

Genome-wide Association Analysis (GWAS)

Biallelic variants were filtered for GWAS analysis, and 42,175 biallelic variants were obtained out of 42,539 SNPs variants. Moreover, out of 421,75 variants, 2,867 variants filtered out due to missing genotype (--geno); 28,057 variants were removed due to minor allele threshold (--maf) and 1236 variants were removed due to the Hardy-Weinberg exact test (--hwe).We retained 9,934 high-quality filtered variants after performing Quality Control (QC), which we used for association analysis. We included age as a covariate in the association analysis. Multiple testing corrections, such as the ‘—adjust’ option was used to perform association analysis. Variants were filtered out based on p < 0.005 and OR >1 (Table 1).

PLINK, Quality Check Results

The GWAS results were examined with Manhattan plots and quantile-quantile (QQ) plots using the ggplot2 [33] and qqman packages [34] in R[35]. Manhattan plots were created based on p values and genomic order by chromosomes. The value on the y-axis represents the –log10 of the p value and a genomic position in genomic order by chromosome on the x-axis (Figure 1). A quantile-quantile plot for the genome-wide phase, in which the observed −log10 p values were plotted against the expected −log10 p values (Supplementary Figure S1).

SNP annotation

Using BCFtools, 42,094 variants were obtained after retaining biallelic variants, and the resulting VCF file was annotated using Ensemble Variant Effect Predictor (VEP) software [36]. Variants mapped to the Bos taurus reference genome were annotated with information retrieved from the Ensembl database. Subsequently, 99.5% of the 42,094 SNPs discovered in the European bison DNA samples were identified as novel variants, whereas 0.5% were identified as known or existing variants (deposited in the Ensembl database) (Table 2). Approximately 79% of these variations were found in intergenic regions. The polymorphisms were primarily intronic nucleotide substitutions (79% of all) and overlapped with 111 genes. One per cent of all SNPs were identified as non-synonymous variations, with numerous potentially harmful effects (as demonstrated by the sorting intolerant from tolerant (SIFT) score [37] (Figure 2; Table 2).

After additional filtering with VCFtools, 2,390 of 42,094 variants were retained. Furthermore, the filtered variants were annotated, and the filtered variations SIFT score, and impact were utilized to evaluate the possible impact of variants (Table 3). Variants that showed potentially deleterious effects were located on Chr 25 (Table 3).

Principal Component Analysis Results

Principal component analysis (PCA) allows the reduction of the complexity of datasets. PC1 accounts for 17.27% of the overall variance in the data set, while PC2 accounts for 14.29% (Figure 3).

Juxtaposition of SNPs Obtained from target enrichment, Reference Pipeline, and Bovine High-Density SNP Chip Tool

We found no common variant between the target enrichment and the reference-based pipeline [38]. Only one marker among those newly discovered (chromosome 12, position 22835942) is included in the Illumina Bovine HD 777 K microarray.

The 30 markers classified as associated with posthitis occurrence may be used for testing the European bison for posthitis susceptibility.

Effective actions to mitigate the effects of the disease seem to be one of the most important tasks for the conservation of the European bison, as the disease is a major threat to the survival of the species [1]. The fact that, despite multiple attempts, the pathogenic factor of posthitis has not been determined suggests the complex and conditional character of the disease. Our study was restricted to the regions and chromosomes identified by other authors[2, 32, 39, 40] as potentially linked to posthitis occurrence.

In spite of the case /control groups we used, and the clearly noticeable association between the posthitis occurrence and markers, specifically on chromosome 25 (Fig. 1), we could not achieve significance of the association in this study when expressed by -log₁₀, a scale often used in association studies [2, 32]. Similar problems appeared in the previous European bison/posthitis attempts [32], which applied the Illumina Bovine HD 777 K microarray.That study found 25 associated markers on 6 chromosomes (15, 3, 9, 13, 26 and 12) with arbitrarilyset -log₁₀ (K. Oleński – personal communication. 2), but just one associated SNP marker localized on chromosome 25, with default significance level (-log₁₀ = 6) and significance level (P < 10^− 6, q < 10^− 3).Forty-five of the 150 DNA samples used in our study were also analysed in previous studies [2, 32].

We suspect that extremely high inbreeding, low heterozygosity in the European bison [41, 42] and the restricted number of available animals, in comparison to clinical human GWAS projects, could impede the accurate calculations of genome/disease associations in the species. Another potential difficulty in the accurate classification of individuals might be ‘false negative’ sampling. In other words, an animal classified as ‘control’ might as well become a ‘case’ in future sampling, as the age of the sampled individuals varied widely. Thus ‘false sampling’ might also bias the significance of the association between posthitis occurrence and the presence of specific SNP alleles.

Future studies on the etiology of the disease should probably restrict the controls to the oldest unaffected males, however, such attempts might also be problematic as the European bison males rarely live as long as 20 years [1]. We had nine affected and four non-affected bison males over 20 years old in the study. Moreover, the oldest posthitis positive male we analyzed was 30 or 31 years old when the illness appeared. Distinguishing between ‘still possible to get affected’ and ‘hardly likely to get affected anymore’ therefore seems very unreliable. The posthitis incidence seems comparable to the risk of cancer in humans, where genetic predisposition is one factor, yet environmental conditions immensely influence the disease appearance. Further research should be probably focused on identifing the relevant environmental conditions.

Correction for multiple testing is an important step in GWAS to control the overall false positive rate when testing a large number of genetic variants for association with a trait or disease. The most common threshold value for multiple testing correction is 5 × 10^− 8 using Bonferroni correction[43, 44]. When we calculated a significant threshold using Bonferroni correction, we obtained 5 × 10^− 6 (0.05/9934) value. None of the SNPs found in the study was significantly associated with posthitis occurrence after Bonferroni correction. Thus, we decided to use the OR and p-value as association measurements.

Determining the significant p value threshold in highly inbred species can be challenging due to limited genetic diversity and a higher likelihood of strong linkage disequilibrium (LD) between genetic variants [41, 45]. Case-control studies tend to employ OR most frequently [46].

Our results did not support allosomes, specifically the Y chromosome, having any role in the genetic conditioning of the disease, although the Y chromosome is still methodologically difficult for genomic monitoring[47]. None of the 276 SNPs on the X chromosome showed an association with the disease.

In order to limit the spread of the posthitis disease, a reduction in the human-directed translocations of bison males among herds has been recommended (Z. Pucek, personal communication). Previous results [2, 32] as well as our findings suggest that strategy of translocating animals may not be appropriate in the context of posthitis, as the genotypes conducive to the disease are most probably linked to genes on autosomal chromosomes.

The 30 markers we classified as associated with posthitis susceptibility, may be used for designing a species-specific set of SNPs dedicated to early posthitis susceptibility detection.

Posthitis disease seems to be characterized by both multifactorial and conditional characters, and the crucial problem in explaining its etiology seems to be a ‘false negative’ classification. All the species-specific SNP markers identified in the study as associated with posthitis symptoms are located on bovine chromosome 25 and may be used for the detection of disease susceptibility in the managed populations of the European bison. Even if the disease has not been significantly associated with European bison genome features investigated so far, such detection panels may help to limit the spread of the disease by preferential translocating animals whose genetic profiles indicate reduced susceptibility to posthitis.

Since posthitis in the European bison has been reported in new locations in Spain [48], the effective conservation of the species requires further attempts, probably concerning complex genomic and environmental factors associations could lead to complete explanations of the puzzling etiology of the disease.

Sample Collection and Genomic DNA Extraction

Total genomic DNA was isolated from soft tissues (muscles, heart, liver, and kidney) and blood samples, depending on the availability of tissue collected by the Mammal Research Institute, Polish Academy of Sciences in Białowieża between 1990 and 2016. Before DNA extraction, soft tissues – blood, liver, muscle and skin – were preserved at -20°C.150 European bison (126 males, 24 females) were employed in this study. The total number of cases was 84, and the controls were 66, including 41 non-affected males and 79 with confirmed posthitis. The rest were males with undefined disease status and females. DNA extraction was performed using the following commercial total DNA isolation kits: Syngen DNA Mini Kit (spin-column protocol, Wrocław, Poland), Qiagen DNeasy®, Blood & Tissue Kit (spin-column protocol), and Sherlock AX Kit (Gdansk, Poland), A&A Biotechnology, a procedure with DNA precipitation, Gdansk, Poland), as per the manufacturer’s guidelines. Most of the materials available were blood samples, and the phenol-chloroform extraction method was used to obtain DNA.

Enrichment Library Preparation and High-Throughput Sequencing (SureSelectXT Target Enrichment System)

In this project, we studied genes or regions suggested previously as significantly associated with posthitis disease[2] as well as other specific regions that might be associated with posthitis[32, 39, 40]. A total of 74 regions or 74 genes (Supplementary Table S2) were captured, distributed across chromosomes 1, 9, 12, 13, 15, 23, 25, 26, 29, and X. This custom capture platform includes 3-5.9Mb targeted features (SureSelectXT Custom Target Enrichment Kit, Agilent). Library construction and sequencing were performed according to the manufacturer’s protocols. High-throughput sequencing was performed on a HiSeq 4000 Illumina platform using 100 bp paired-end sequencing runs at BGI lab www.bgi.com).

Read alignment, variant calling and filtering

All reads were trimmed and filtered for adapter using default parameters using Trim Galore [49] (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). A read quality check was performed before and after trimming and filtering data using the FastQC tool [50]. Quality Check reports were generated for all fastq files. After filtering and quality checking, reads were mapped onto the Bos taurus reference genome (UMD 3.1) using BWA-MEM version 0.7.17 [51, 52] producing a binary alignment file for each sample.Read group information was included with the BWA mem program during the alignment step.

Data pre-processing steps and Variant discovery [53, 54, 55] were performed following the GATK Best Practices https://software.broadinstitute.org/gatk/best-practices/) using GATK [54]. Aligned reads were sorted by query name using SortSam. The GATK FixmateInformation program was used to ensure consistent information appears for both read in pairs on the sorted alignment file by query name; the GATK SortSam module was then used to sort the output file according to genomic coordinate orders. Duplicate reads were marked using the GATK’s MarkDuplicates, and SetNmMdAndUqTags was used to fix the Tag. The final bam file for each sample was verified using GATK’s ValidateSam. After preprocessing, GATK's Haplotype Caller [54] was run on individual BAM files in GVCF (genomic variant call format) mode to produce a separate gVCF file for each individual within the targeted regions (genomic interval) by 100 bp interval padding. The Interval list file, which corresponds to the genomic regions targeted during library preparation, was used to perform variant calling steps. An Interval list might also be provided by the kit manufacturer. For the targeted sequencing approach or exome sequencing approach, it had been suggested to add interval padding (usually 100 bp) by the GATK (https://gatk.broadinstitute.org/). GATK’s CombineGVCFs were used to produce one gVCF file by merging gVCF files for each sample.After combining gVCF files, GATK’s GenotypeGVCFs module was used to convert the gVCF file to a VCF file. The finally obtained VCF file contained all raw variants for downstream analysis. GATK ‘s SelectVariants module was used to extract the raw SNPs and Indels from the combined VCF file. Then these extracted SNPs and Indel variants were subjected to Hard filters. Hard filtering discards variants below specific thresholds for properties, such as variant confidence, root mean square of the mapping quality and strand bias. Variant annotations quality (qual), depth (DP), Fisher Strand (FS), root mean square of Mapping Quality of reads supporting a variant call (MQ), quality by depth (QD), Mann–Whitney–Wilcoxon Rank Sum tests MQRankSum, ReadPosRankSum, BaseQRankSum and ClippingRankSum—characterize low level properties of variants from information in the BAM file.

GATK’s Variant Filtration module was used to select and filter the high-quality SNPs (-filter-name "QD2" -filter "QD < 2.0" -filter-name "FS60" -filter "FS > 60.0" -filter-name "MQ40" -filter "MQ < 40.0" -filter-name "SOR3" -filter "SOR > 3.0" -filter-name "MQRankSum-12.5" -filter "MQRankSum < -12.5" -filter-name "ReadPosRankSum-8" -filter "ReadPosRankSum < -8.0" -filter-name "QUAL30" -filter "QUAL < 30.0") and Indels.

After Hard Filtering, variants that were not assigned a flag as “PASS” by GATK were excluded from downstream analyses using BCFtools [56]. The resulting VCF file was used for GWAS analysis in PLINK 1.9[57, 58]and Annotation of Variants using Ensemble’s Variant Effect Predictor (VEP) tool.

Genome-wide association analysis (GWAS)

Before converting the final VCF file to PLINK 1.9 format, only bi-allelic single nucleotide polymorphisms (SNPs) variants were retained using BCFtools. Bi-allelic variants were used for downstream analysis as PLINK is unable to utilize multi-allelic SNPs. Filtered VCF file containing only QC-passing variants was converted into PLINK binary-format files (*.bed, *.bim, *.fam) using PLINK. We used PLINK to retain male bison, and GWAS was performed only on these files. A principal component analysis was also performed and plotted.

Statistical Analysis

PLINK was used to perform quality control using several parameters (--mind 0.1 --geno 0.1 --maf 0.05 HWE p-value < 1e-6). We excluded SNPs based on minor allele frequency (MAF < 5%), 10% of missing genotypes and HWE p-value < 1e-6 before carrying out association analysis.

GWAS was conducted using logistic regression with an additive genetic model, implemented in PLINK 1.9. The value of the confidence interval (CI) was 95%.For -model and case/control --assoc, '--ci X' causes size-X centred CI to be reported for OR. (E.g., "-ci 0.95" corresponds to a 95% confidence interval).

The Bonferroni correction has been used to adjust for multiple testing corrections in several studies [59, 60, 61]. Due to the high level of inbreeding and the limitation of the sample, a Bonferroni corrected genome-wide significance level was too conservative to be applied in our case, hence SNPs with p < 0.005 and OR > 1 were considered as significant trait association [62]. We used OR, understood as a relationship between the likelihood of an event occurring and a variable, as a measurement of association [63]. Then the genome-wide significance was calculated as a p value < 0.05/n, where ‘n’ denotes the number of obtained single nucleotide polymorphisms (SNPs).

Annotation of Variants

The Variant Effect Predictor (VEP) tool [36] was used to perform annotation of genetic variations. It is the most widely used annotation tool to annotate the genomic variation found in high-throughput sequencing data. Using BCFtools, the VCF file generated after Hard filtering was utilized to retain biallelic variants. Variant annotation was performed on the resulting VCF file, both with and without additional variant filtering. A total of 150 samples (both male and female) were utilized to conduct variant annotation. The default bos taurus genome assembly (ARS-UCD 2.1) was selected to perform annotation using the VEP.

VCFtools [53] was used to implement additional variant filtering. The following parameters were used to filter the variants: - -average read depth between 10 and 60 (--min-meanDP 10 --max-meanDP 60); -only biallelic variants were retained using the minimum and maximum alleles to two (--min-alleles 2 --max-alleles 2 ); -greater than 80% missing data across all samples (--max-missing 0.8); -all SNPs with quality scores less than 20 (--minQ 20); -minor allele frequency (MAF)(--maf 0.05).

Principal Component Analysis (PCA)

Principal component analysis (PCA) was carried out on the quality-filtered dataset using PLINK 1.9’s “PCA” command. This produced output files with the first 20 ordered eigenvectors and eigenvalues. The first two components were used for subsequent analysis and visualization.

Juxtaposition of SNPs Obtained from Target Enrichment, Reference Pipeline, and Bovine High-Density SNP Chip Tool

We compared the number of variants acquired from different pipelines and checked the uniqueness of the variants obtained in our previous work [38] with the Bovine High-Density SNP Chip Tool.

Author Contributions:

S.K., M.T. and S.C. designed research; I.W. extracted and evaluated DNA, S.K. and S.C. performed analysis; S.K., S.C., M.T. and A.V.S analyzed data ; S.K., M.T. A.V.S., S.C., I.W. and C.P. wrote the paper.

Competing Interests Statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data access statement

The data that support the findings of this study are available on request in Open Forest Data at http://doi.org/10.48370/OFD/PO0GXG (https://dataverse.openforestdata.pl/dataset.xhtml?persistentId=doi:10.48370/OFD/PO0GXG).

Funding

The research was funded by Narodowe Centrum Nauki (NCN), grant number 379 2016/23/B/NZ9/03411.

Krasińska, M., Krasiński, Z.A. Course and dispersion of the posthitis/balanoposthitis in males of the European bison in the Polish part of Bialowieza Forest. Natl. Park. Nat. Reserv. 2010; 29:107–128 (in Polish, abstract in English).
Oleński, K. et al. Genome-wide association study for posthitis in the free-living population of European bison (Bison bonasus). Biol. Direct. 2015;10doi: 10.1186/s13062-014-0033-6.
Korochkina, L.N., Kochko, F.P. K voprosu o smertnosti Zubrom v estestvennych uslovijach Belovezhskojj Pushchi. Zapov Bel Issl. 1982;6:96–103 (in Russian).
Shabailo, V.E., Pererva, V.I. Issledovanie zabolevanij samcov zubrov Belovezhoskoj Pushchi i Nadvirnyanskogo lespromkhoza. In: Materialy nauchnoj konferencii posvashchennoj 50-letiyu issledovanij v Belovezhoskoj Pushchi, Kamenjuki. 1989:219–220.
Piusiński, W., Malicka, E., Bielecki, W., Osińska, B., Lenartowicz-Kubrat, Z.Pathomorphological lesions in bison in the Białowieża Forest. Med. Weter. 1996;52: 386–388 (In Polish with English summary).
Lehnen, A. et al. Arcanobacterium bialowiezense sp. nov. and Arcanobacterium bonasi sp. nov., isolated from the prepuce of European bison bulls (Bison bonasus) suffering from balanoposthitis, and emended description of the genus Arcanobacterium Collins et al. 1983. Int J. Syst. Evol. Microbiol. 2006;56:861–866. doi:10.1099/ijs.0.63923-0.
Dymnicka, M., Dębska, M., Arkuszewska, E., Olech, W. Serum and tissue concentrations of selected biochemical and mineral compounds in relation to the incidence of balanoposthitis in European bison. Roczniki Naukowe Polskiego Towarzystwa Zootechnicznego. 2009;5:129–137.
Thiede, S. et al. Antibodies against Mycoplasma bovigenitalium in free-living European bison (Bison bonasus) with balanoposthitis. J. Wildl. Dis. 2002;38:760–763. doi:10.7589/0090-3558-38.4.760
Radwan, J., Kawałko, A., Wójcik, J.M., Babik, W. MHC-DRB3 variation in a free-living population of the European bison, Bison bonasus. Mol. Ecol. 2007;16:531–540. doi:10.1111/j.1365-294X.2006.03179.x
Gill, J. Zarys fizjologii żubra (Outlines of bison physiology) Warszawa: Severus Publishers. 1999.
Piusiński, W., Bielecki, W., Malicka, E., Lenartowicz-Kubra, Z. Pathomorphology and pathogenesis of diseased genital organs (Prepuce and Penis) of bisons in the Białowieża Forest. Med. Weter. 1997;53:596–600.
Luenser, K., Fickel, J., Lehnen, A., Speck, S., Ludwig, A. Low level of genetic variability in European bisons (Bison bonasus) from the Bialowieza National Park in Poland. Eur. J. Wildl. Res. 2005;51:84–87. doi:10.1007/s10344-005-0081-4.
Matuszewska, M,, Lech, M.W., Bielecki,W, Osińska, B. Wpływ inbredu na występowanie zmian patologicznych w układzie rozrodczym samców żubrów. Parki Narodowe i Rezerwaty Przyrody. (Polish, abstract in English) 2004;23:679–685.
Calvo, S.E. et al. High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency. Nat. Genet. 2010;42:851–858. doi:10.1038/ng.659.
Chou, L.S., Liu, C.S., Boese, B., Zhang, X., Mao, R. DNA sequence capture and enrichment by microarray followed by next-generation sequencing for targeted resequencing: neurofibromatosis type 1 gene as a model. Clin Chem. 2010;56:62–72. doi: 10.1373/clinchem. 2009.132639.
Daiger, S.P. et al. Targeted high-throughput DNA sequencing for gene discovery in retinitis pigmentosa. Adv Exp Med Biol. 2010;664:325–331. doi: 10.1007/978-1-4419-1399-9_37.
Rivas, M.A. et al. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet. 2011;43:1066–1073. doi:10.1038/ng.952.
Teer, J.K. et al. Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. Genome Res. 2010;20:1420-. doi: 10.1101/gr.106716.110.
Cronn, R. et al. Targeted enrichment strategies for next-generation plant biology.Am. J. Bot. 2011;99:291–311.
Grover, C.E., Salmon, A., Wendel, J.F. Targeted sequence capture as a powerful tool for evolutionary analysis. Am. J. Bot. 2012;99:312–319.
Shirasawa, K. et al. Target amplicon sequencing for genotyping genome-wide single nucleotide polymorphisms identified by whole genome resequencing in peanut. Plant Gen.2016;9:1–8.
Winfield, M.O. et al. Targeted re-sequencing of the allohexaploid wheat exome. Plant Biotechnol. J. 2012;10:733–742.
Syvanen, A.C. Toward genome-wide SNP genotyping. Nat Genet. 2005;37: Suppl: S5–10. doi:10.1038/ng1558.
Otto, P.I. et al. Genome-wide association studies for tick resistance in Bos taurus × Bos indicus crossbred cattle: A deeper look into this intricate mechanism. J. Dairy Sci. 2018;101:11020–11032. doi: 10.3168/jds.2017-14223.
Ren, M. et al. Genome-Wide Association Study of the Genetic Basis of Effective Tiller Number in Rice. Rice. 2021;14:56. https://doi.org/10.1186/s12284-021-00495-8.
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research 2014;42:D1001–D1006. doi:10.1093/nar/gkt1229.
Adhikari, K. et al. A genome-wide association study identifies multiple loci for variation in human ear morphology. Nat Commun. 2015;6:7500. doi:10.1038/ncomms8500.
Ellinghaus, et al. Genomewide Association Study of Severe Covid-19 with Respiratory Failure. N. Eng. J. Med. 2020;383:1522–1534. doi:10.1056/NEJMoa2020283
Zhu, C., Gore, M., Buckler, E.S., Yu, J. Status and prospects of association mapping in plants. Plant Genome 2008;1:5–20. doi: 10.3835/plantgenome2008.02.0089.
Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines, Nature 2010;465: (7298), 627.
Branham, S.E., Wright, S.J., Reba, A. and Linder, C. R. Genome-wide association study of Arabidopsis thaliana identifies determinants of natural variation in seed oil composition, Journal of Heredity 2015;107:248–256.
Oleński, K. et al.A refined genome-wide association study of posthitis in lowland Białowieza population of the European bison (Bison bonasus). Eur J Wildl Res. 2020;66:. https://doi.org/10.1007/s10344-019-1341-z
Wickham, H. et al. Package ggplot2: create elegant data Visualisations using the grammar of graphics 2016;2:1–89.
Turner, S.D. qqman: an R package for visualizing GWAS results using QQ and Manhattan plots Bio. Rxiv. 2014.
R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical computing;. 2019. URL http://www.R-project.org/.
McLaren, M. et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122.
Kumar, P. Henikoff, S. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 2009;4:1073–1081
Kunvar, S., Czarnomska, S., Pertoldi, C., Tokarska, M. In Search of Species-Specific SNPs in a Non-Model Animal (European Bison (Bison bonasus))-Comparison of De Novo and Reference-Based Integrated Pipeline of STACKS Using Genotyping-by-Sequencing (GBS) Data. Animals 2021;11:2226. doi:10.3390/ani11082226.
Morar, N., Willis-Owen, S.A., Moffatt, M.F., Cookson, W.O. The genetics of atopic dermatitis. J Allergy Clin Immunol. 2006;118:24–36. doi:10.1016/j.jaci.2006.03.037.
Martin, M.J. et al.Genetics and Epigenetics of Atopic Dermatitis: An Updated Systematic Review. Genes (Basel) 2020;11:442.
Tokarska, M., Pertoldi, C., Kowalczyk, R., Perzanowski, K. Genetic status of European bison after extinction in the wild and subsequent recovery. Mam Rev. 2011;41:151–162.
Iacolina, L. et al. Novel Graphical Analyses of Runs of Homozygosity among Species and Livestock Breeds. Biol. J. Linn. Soc. 2016;114:752–763.
Fadista, J. et al. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur. J. Hum. Genet. 2016;24:1202–1205 https://doi.org/10.1038/ejhg.2015.269
Chen, Z., Boehnke, M., Wen, X., Mukherjee, B. Revisiting the genome-wide significance threshold for common variant GWAS. G32021;11.
Biscarini, F., Mastrangelo, S., Catillo, G., Senczuk, G., Ciampolini, R. Insights into Genetic Diversity, Runs of Homozygosity and Heterozygosity-Rich Regions in Maremmana Semi-Feral Cattle Using Pedigree and Genomic Data. Animals 2020;10:2285.
Szumilas, M. Explaining Odds Ratios. J Can Acad Child Adolesc Psychiatry. 2010;19:227–229.
Anderson, K., Cañadas-Garre, M., Chambers, R., Maxwell, A. P., McKnight, A. J.The Challenges of Chromosome Y Analysis and the Implications for Chronic Kidney Disease. Frontiers in Genetics 2019;10. doi:10.3389/fgene.2019.00781.
Moran, F. Few cases of balanopostitis of European bison males in Spain. European Bison Conservation Newsletter. 2016;9:87–93.
Andrews, S. et. al. Babraham Bioinformatics - Trim Galore! wrapper script for automated quality and adapter trimming and quality control. 2015.
Andrews, S. FASTQC. A Quality Control Tool for High Throughput Sequence Data. 2010. Available online: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv: 2013:1303.3997.
Li, H, Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 2010;26:589–595.
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 2011;27: https://doi.org/10.1093/bioinformatics/btr330
Van der Auwera, G.A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;43: doi: 10.1002/0471250953.bi1110s43.
McKenna, A. et al. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20. doi:10.1101/gr.107524.110
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 2011;27. doi:10.1093/bioinformatics/btr509
Chang, C.C. et al. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 2015;4. doi:10.1186/s13742-015-0047-8
Purcell, S. et.al. Plink: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Gene. 2007;81:559.
Dlamini, S.N. et al.Associations Between CYP17A1 and SERPINA6/A1 Polymorphisms, and Cardiometabolic Risk Factors in Black South Africans. Front Genet. 2021;12: doi:10.3389/fgene.2021.687335
Gao, X. Multiple testing corrections for imputed SNPs. Genet Epidemiol. 2011;35:154–158. doi:10.1002/gepi.20563
Van den Berg, S., Vandenplas, J., van Eeuwijk, F.A., Lopes, M.S., Veerkamp, R.F. Significance testing and genomic inflation factor using high-density genotypes or whole-genome sequence data. J Anim Breed Genet. 2019;136:418–429 (2019). doi:10.1111/jbg.12419
Leo, G. D., Sardanelli, F. Statistical significance: p value, 0.05 threshold, and applications to radiomics—reasons for a conservative approach. Eur Radiol Exp. 2020;4:18.
Greenfield, B. et al.Previously suicidal adolescents: Predictors of six-month outcome. J Can Acad. Child. Adolesc. Psychiatry. 2008;17:197–201.

Table 1. List of SNPs with optimal values for posthitis susceptibility testing. Variants filtered on the basis of p value smaller than 0.005 and odds ratio (OR) greater than 1 for Chromosome (CHR) and Position of European Bison

CHR	Position	OR	p value
25	5404156	2.578	0.0009759
25	6262293	4.405	0.001418
25	6484953	4.5	0.001431
25	6520323	4.5	0.001431
25	6567133	4.5	0.001431
25	6108268	5.116	0.001467
25	6166672	4.337	0.001652
25	6168935	4.74	0.001684
25	6314388	4.25	0.001746
25	6301052	4.478	0.001764
25	6301069	4.6	0.001847
25	6314630	4.13	0.002047
25	6275710	4.11	0.002089
25	5385495	3.426	0.002108
25	5406272	2.34	0.002238
25	6283723	4.221	0.002351
25	6166696	4.387	0.00237
25	6096713	3.218	0.002477
25	6206196	4.145	0.002533
25	5404409	2.323	0.002624
25	5404524	2.323	0.002624
25	5405578	2.323	0.002624
25	5405765	2.323	0.002624
25	5405826	2.323	0.002624
25	5406232	2.323	0.002624
25	5404220	3.414	0.002633
25	5404316	3.414	0.002633
25	5405828	3.414	0.002633
25	5406452	3.414	0.002633
25	5407249	3.414	0.002633

Table 2. Annotated genomic variants before and after filtering using VEP (Ensembl Variant Effect Predictor) online tool (based on Bos taurus genome). The parameters used to filter the variants were: (1)average read depth between 10 and 60 (--min-meanDP 10 --max-meanDP 60) (2) only biallelic variants were retained using the minimum and maximum alleles to two (--min-alleles 2 --max-alleles 2) (3) greater than 80% missing data across all samples (--max-missing 0.8) (4)All SNPs with quality scores less than 20 (--minQ 20) (5) minor allele frequency (MAF)(--maf 0.05).

Category	Count before filtering	Count after filtering
Variant processed	42,094	2,390
Novel / existing variants	41,904 (99.5) / 190 (0.5)	2,378 (99.5) / 12 (0.5)
Overlapped genes	111	94
Overlapped transcript	399	318

Table 3. Candidate genes associated with posthitis, based on the 25 Bovine Chromosome.

Quality filtered and annotated variants using VEP were again filtered based on SIFT score.

SIFT score helps to determine whether it affects protein function or n

Location	Allele	Consequence	Impact	Symbol	Gene product/effect	SIFT
25:3247953-3247953	C	missense variant	Moderate	ADCY9	Membrane bound enzyme. Role in cardiac abnormalities (Wu et al, 2020).	deleterious (0.01)
25:3499677-3499677	C	missense variant	Moderate	GLIS2	Essential role in the development and maintenance of renal tissue, prevents apoptosis and fibrosis (Attanasio et al, 2007).	deleterious (0.05)
25:3535017-3535017	C	missense variant	Moderate	VASN	Role in nephrological diseases and pan-cancers (Bonnet et al, 2018).	deleterious (0.04)
25:3651939-3651939	A	missense variant	Moderate	HMOX2	Cobvert heme to catabolic products (Yao et al, 2020)	deleterious (0.02)
25:3781710-3781710	T	missense variant	Moderate	MGRN1	Anti-oxidatory and anti-oxidant molecule (Abrisqueta et al, 2022).	deleterious (0.05)
25:3781710-3781710	T	missense variant	Moderate	MGRN1		deleterious (0.01)
25:3781710-3781710	T	missense variant	Moderate	MGRN1		deleterious (0.05)
25:3806249-3806249	G	missense variant	Moderate	ANKS3	Required for normal renal development (Yakulov et al, 2015).	deleterious low_ confidence (0.05)

No competing interests reported.

AlethaldiseaseSupplementaryfile.pdf

Download PDF

Editorial decision: Revision requested
05 Apr, 2024
Reviews received at journal
28 Mar, 2024
Reviewers agreed at journal
20 Mar, 2024
Reviewers agreed at journal
11 Mar, 2024
Reviewers invited by journal
10 Mar, 2024
Editor assigned by journal
07 Mar, 2024
Editor invited by journal
29 Feb, 2024
Submission checks completed at journal
29 Feb, 2024
First submitted to journal
16 Feb, 2024

You are reading this latest preprint version

A lethal disease of the European bison - posthitis is conditionally determined by its genomics.

Status:

Version 1

Abstract

Figures

Introduction

Results

Sequence and Variant Calling

SNP Identification

Genome-wide Association Analysis (GWAS)

PLINK, Quality Check Results

SNP annotation

Principal Component Analysis Results

Juxtaposition of SNPs Obtained from target enrichment, Reference Pipeline, and Bovine High-Density SNP Chip Tool

Discussion

Conclusions

Methods

Sample Collection and Genomic DNA Extraction

Enrichment Library Preparation and High-Throughput Sequencing (SureSelectXT Target Enrichment System)

Read alignment, variant calling and filtering

Genome-wide association analysis (GWAS)

Statistical Analysis

Annotation of Variants

Principal Component Analysis (PCA)

Juxtaposition of SNPs Obtained from Target Enrichment, Reference Pipeline, and Bovine High-Density SNP Chip Tool

Declarations

Competing Interests Statement

Data access statement

Funding

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1