Host Genetics Underlying Pathological Outcomes to Mycobacterium Avium Subsp. Paratuberculosis Infection is Governed by Distinct Genetic Variants


 Bovine paratuberculosis (PTB), caused by Mycobacterium avium subsp. paratuberculosis (MAP), is a chronic granulomatous enteritis that affects cattle worldwide. According to their severity and extension, PTB-associated histological lesions have been classified into the following groups; focal, multifocal, and diffuse. It is unknown whether these lesions represent sequential stages or divergent outcomes. In the current study, the associations between host genetics and pathology were explored by genotyping 813 Spanish Holstein cows with no visible lesions (N= 373) and with focal (N=371), multifocal (N=33), and diffuse (N=33) lesions in gut tissues and regional lymph nodes. DNA from peripheral blood samples of these animals was genotyped with the Bovine EuroG MD Bead Chip, and the corresponding genotypes were imputed to whole-genome sequencing (WGS) data using the 1,000 Bull genomes reference population. A genome-wide association study (GWAS) was performed using the WGS data and the presence or absence of each type of histological lesion in a case-control approach. No overlap was seen in the SNPs controlling the three PTB-associated lesions. A total of 192 and 92 SNPs defining 13 and 9 distinct quantitative trait loci (QTLs) were highly-associated (P ≤ 5 x 10-7) with the multifocal (heritability= 0.075) and the diffuse (heritability= 0.189) lesions, respectively. Some of the identified QTLs overlapped with QTLs previously associated with PTB, bovine tuberculosis, mastitis, and IgG levels. Pathway analysis with candidate genes overlapping the identified QTLs revealed a significant enrichment of the keratinization pathway and cholesterol metabolism in the animals with multifocal and diffuse lesions, respectively. While keratin variants may predispose cows to the development of multifocal lesions, cholesterol variants associate with the more severe lesions. Taken together, these findings suggest that the variation in MAP-associated pathological outcomes might be genetically determined and indicative of distinct host responses in genetically predisposed individuals.


Introduction
Paratuberculosis (PTB) or Johne´s disease is a chronic enteritis of domestic and wild ruminants caused by Mycobacterium avium susbp. paratuberculosis (MAP). PTB is a major problem for animal health and must be noti ed to the World Organization for Animal Health. In Europe and North America, PTB is considered endemic in dairy cattle, with herd prevalence estimates higher than 50 % [1]. PTB causes great economic losses to the dairy industry due to decreased milk production, weight loss, replacement cost, reduced slaughter value, a greater risk to other health problems, premature culling or death from the clinical disease, and the costs of veterinary expenses and control measures [2][3]. Infection occurs in the rst months of life, primarily through the fecal-oral route, but clinical onset only takes place around calving when animals are 18 months or older. The most common clinical signs are decreased milk yield, chronic diarrhea, and progressive weight loss that eventually result in the death of the animal [4].
However, most infected animals do not develop clinical disease, and microbiological and immunological diagnostic tests are not sensitive enough to identify them [5]. MAP has been postulated as a possible trigger factor in several autoimmune diseases in humans such as Crohn's disease (CD) [6], type I diabetes [7], multiple sclerosis [8], or rheumatoid arthritis [9]. It has been hypothesized that some antigenic structures of MAP are cross-reactive to self-proteins and might be responsible for these human autoimmune diseases in genetically predisposed individuals [10]. Colorectal cancer is a complication of the two forms of idiopathic in ammatory bowel disease (IBD); colonic CD and ulcerative colitis.
Interestingly, MAP bacilli have been detected in the intestines of patients with CD, ulcerative colitis, and IBD-associated colorectal cancer [11][12].
MAP causes lesions in naturally and experimentally infected cattle that greatly differ in severity.
According to their extension, cellular in ammatory in ltrate composition, and amount of MAP present in the lesions, PTB-associated lesions were classi ed into the following categories: focal, multifocal, and diffuse (diffuse paucibacillary or lymphoplasmacytic, diffuse intermediate, and diffuse multibacillary or histiocytic) [13][14]. Focal lesions consist of small granulomas in the ileal and jejunal lymph nodes or the ileocaecal lymphoid tissue. In the multifocal lesions, middle size granulomas appear in the apex of some intestinal villi, as well as in the lymph nodes. These middle-size granulomas are formed by groups of macrophages, surrounded by lymphocytes, and do not cause diffuse enteritis or modify the normal architecture of the intestine. In the diffuse paucibacillary or lymphoplasmacytic lesions, lymphocytes are the main cellular type, with some macrophages containing few if any mycobacteria. In diffuse intermediate lesions, the in ltrate is formed by abundant lymphocytes, macrophages, and Langhans giant cells with moderate numbers of acid-fast bacilli. Diffuse multibacillary or histiocytic lesions associate with severe granulomatous enteritis affecting different intestinal locations and are formed mainly by foamy macrophages loaded with cholesterol and large numbers of MAP. Two disease phenotypes, latent and patent, have been established with most infected cows classi ed as latent (with no PTB-associated clinical signs) [5]. Clinical signs and gross lesions are exclusively associated with diffuse lesions.
There is considerable variation among individuals in the response to MAP infection, a proportion of which has been shown to be genetic. In a previous study, we compared the genetic determinants associated with ante-mortem (serum ELISA) and post-mortem (tissue PCR and culture) diagnostic de nitions in a common set of animals using WGS data [15]. Combination test interpretation (all tests negative equals non-infected and all tests positive equals infected) was used to increase the sensitivity of the ELISA-Tissue PCR-tissue culture combination. This strategy reduced the risk of misclassi cation and increased the hereditability of the trait (h 2 = 0.139) when compared with the h 2 of the individual tests (h 2 < 0.08). In the current study, we explored the genetic basis of the PTB-associated pathology and whether the postmortem examination of gut tissues and regional lymph nodes could improve the accuracy of the classi cation of infected animals and uninfected controls providing higher h 2 estimates. For this purpose, variance components and h 2 were estimated for each type of MAP-associated histopathological lesion in Spanish Holstein cattle (N = 813). Subsequently, single nucleotide polymorphisms (SNPs), quantitative trait loci (QTL), candidate genes within signi cant QTLs, and pathways analysis using the identi ed candidate genes were conducted.

Materials And Methods
Ethics statement. Animals used in this study were not submitted to any in vivo experimentation before stunning for slaughter and, therefore, no speci c ethics committee authorization was required. Histopathological examination. Post-mortem tissue sampling was performed as previously described [44]. Brie y, samples from the ileocecal lymph node, jejunal lymph node, ileocecal valve (ICV), jejunum, and terminal ileum were collected aseptically from each animal and placed in formalin within 24 h after arrival at the laboratory. The samples were preferentially taken from areas of the preselected tissues that showed gross lesions, thickened mucosa, and enlarged lymphatic nodes if present. The samples were xed in 10% neutral buffered formalin for 72 h, dehydrated through graded alcohols and xylol, embedded in para n, and cut into 4 μm sections. Sections were, mounted on treated microscope slides, stained with hematoxylin and eosin (HE) and with Ziehl-Neelsen (ZN), and examined by light microscopy for pathological lesions and for the presence of acid-fast bacteria, respectively. According to their location and extension, in ammatory cell type, and mycobacterial load, PTB-associated histopathological lesions were classi ed as focal, multifocal, and diffuse lesions [13].
Genotyping and imputation. Peripheral blood samples were collected at the slaughter time and DNA was extracted using the QIAmp DNA Blood Mini Kit according to the manufacturer´s instructions (Qiagen, Hilden, Germany). Puri ed genomic DNA was quanti ed spectrophotometrically and subsequently genotyped with the Illumina Bovine Euro medium-density (MD) Bead Chip at the molecular genetic laboratory service of the Spanish Federation of Holstein Cattle (CONAFE) using the In niumTM iScan software for allele assignation (Illumina, San Diego, CA). Individual genotypes were phased using Eagle 2.4 [45] and imputed with minimac4 [46] to the Bovine HD Bead Chip using a reference panel of 1,278 Bos taurus from Run7.0 of the 1,000 Bull Genomes project and 581,712 SNPs (ASR-UCD1.2). Imputation to WGS level was then undertaken using a reference population of 2,333 Bos taurus from Run6.0 of the 1,000 Bull Genomes project [47]. In total, 33.77 million SNPs per animal were identi ed across the genome. All the SNPs had a call rate >0.80. PTB-associated SNPs with minimum allele frequency (MAF) < 0.01 were removed. The number of SNPs kept in the analysis was 13,881,067.
Variance components and h 2 estimates. The variance components, standard errors (SE), and h 2 estimates explained by all the SNPs were calculated using the genome-wide complex trait analysis (GCTA) software 1.93.2, according to the following formula: where σ G is the additive genetic effect of the individuals and σ e is the residual variance [48]. The variance components σ G and σ e in the equation were estimated by the genomic-relatedness-based restricted maximum-likelihood (GREML) approach implemented in GCTA. The concept behind this method is to t all the SNPs simultaneously as random effects in a mixed linear model to estimate the phenotypic variance explained by all the SNPs. GCTA implements the method in two steps; rst generating the GRM between individuals and then estimating the variance explained by all SNPs by a Restricted Maximum Likelihood (REML) analysis of the phenotypes with the GRM.
Genome-wide association study (GWAS). Imputed genotypes and the presence or absence of focal, multifocal, and diffuse lesions were analyzed in a case-control study using the mlma (mixed linear model) association analysis of the GCTA 1.93.2 which ts the effects of all the SNPs as random effects [48]. Brie y, the model is y = a + bx + g + e, where y is the phenotype, a is the mean term, b is the additive effect ( xed effect) of the candidate SNP to be tested for association, x is the SNP genotype indicator variable coded as 0, 1 or 2, g is the polygenic effect (random effect) i.e. the accumulated effect of all SNPs (as captured by the GRM calculated using all SNPs), and e is the residual [48]. Control cows did not have visible lesions in gut tissues and had a negative ELISA, tissue PCR, and culture result at the time of slaughter. Age was included as a covariate in the analysis. After the GWAS, the SNPs with R 2 values higher than 70 % were retained. To account for multiple testing, a 5 % chromosome-wise false discovery rate (FDR) was used to determine the probability that the associations were not false positives. P-values between P= 5 × 10 -5 and P= 5 × 10 -7 provided a moderate signi cance level (α), and P-values < 5 × 10 -7 were used to identify highly signi cant associations (Wellcome Trust Case Control Consortium, 2007).
The in ation factor (λ) and quantile-quantile plots were calculated to compare observed distributions of -log (P-values) to the expected distribution under the no association model for each phenotype. λ value close to 1 suggests appropriate adjustment for potential substructure and λ > 1.2 suggests population strati cation. The regression coe cients (b) calculated using GCTA represent the estimated increase in the log odds of the outcome per unit increase in the value of the exposure. In other words, the exponential function of b (e b ) is the odds ratio (OR) associated with a one-unit increase in exposure. In addition, the OR and their 95 % con dence intervals (CI) for the SNPs associated with the presence or absence of each type of histopathological lesion (P < 5 × 10 -7 ) were calculated using logistic regression analysis with the WGassociation function of SNPassoc SNPs that surpassed the suggestive signi cance threshold (P < 5 × 10 -7 ) in a given chromosome. The beginning and end of each QTL were de ned in a window of 500,000 base pairs upstream and downstream by the SNPs that were furthest upstream and downstream of the suggestive SNP.
Overlapping QTL regions were merged and considered as a single QTL. SNP variants, QTLs, and candidate genes identi cation. To explore the genetic basis of the PTBassociated pathology, a GWAS using the imputed WGS datasets and the presence or absence of focal, multifocal, or diffuse lesions was performed. The number of SNPs that surpassed the moderate (between P= 5 × 10 -5 and P= 5 × 10 -7 ) and the high (P< 5 × 10 -7 ) thresholds are presented in Table 1. A total of 277 and 647 SNPs were moderately associated with the multifocal and diffuse lesions, respectively. With the high threshold of association, we identi ed 129 and 92 SNPs highly-associated with the multifocal and the diffuse lesions, respectively. Interestingly, no overlap was seen in the SNPs with high and moderate association with the multifocal or diffuse lesions. As seen in Figure 1A, there was not any SNP surpassing the moderate or the high threshold of association with the focal lesions. In contrast, several chromosomal regions were highly-associated with the multifocal (Fig. 1B) and diffuse (Fig. 1C) lesions.
While SNPs highly associated with the multifocal lesions were located in BTA3, BTA5, BTA11, BTA22, BTA23, and BTA24, associations with the diffuse lesions were found in BTA1, BTA3, BTA7, BTA8, BTA13 and BTA23. For the multifocal and diffuse lesions, most of the highly-associated SNPs were located in intronic regions; 50 and 51 %, respectively. For the multifocal lesions, 18 and 12 % of the SNP were downstream and upstream gene variants, respectively. In contrast, only 2 and 5 % of the SNPs associated with the diffuse lesions were downstream and upstream gene variants and none of the diffuseassociated SNPs were in 3´UTR positions (Supplementary Fig. 1).
A description of the SNPs associated with the multifocal lesions surpassing the threshold (P < 5 × 10 -7 ), P-values, along with candidate genes located within the de ned QTLs are presented in QTLs associated with blood IgG levels (QTL66217, QTL66218, QTL66220, QTL66222) [19]. Genes within this region such as the NFKB Inhibitor Epsilon (NFKBIE) may control the response to several bacterial and viral pathogens as well as vaccines, possibly by in uencing antibody production. The candidate genes associated with the multifocal lesions are novel in the sense that they have not been associated with PTB risk in cattle before with the exception of the Neuronal PAS domain protein 2 (NPAS2) a member of the basic helix-loop-helix (bHLH)-PAS family of transcription factors. NPAS2 is a core component of the circadian clock, an important regulator of a wide array of physiological functions including metabolism, sleep, body temperature, blood pressure, endocrine, immune, cardiovascular, and renal function.
The GWAS de ned 92 SNPS and 9 QTLs with a high association with the diffuse lesions and located in 6 chromosomes (BTA1, BTA3, BTA7, BTA8, BTA13 and BTA23) ( Quantile-quantile plots and odds ratio. Quantile-quantile plots comparing the observed distribution of -log (P-values) to the expected values under the null hypothesis were generated. The plots showed a distribution close to the expected distribution line for the following phenotypes: focal vs undetected lesion (λ median = 1.01689), multifocal vs undetected lesion (λ median = 1.04401), and diffuse versus undetected lesion (λ median = 1.03222), indicating that signi cant values were not overestimated due to population strati cation or cryptic relatedness. A slight deviation in the upper right tails from the y = x line suggested that some association was present in the data. The regression coe cients for all the SNPs associated with the presence of multifocal and diffuse lesions were all positive suggesting a positive correlation between exposure and outcome (data not shown). In addition, the OR calculated for all the SNPs associated with the presence of multifocal and diffuse lesions (P < 5 × 10 -7 ) were > 1 indicating that the animals with the minor alleles were more predisposed to develop multifocal and diffuse lesions (Supplementary Table 1).
Functional GO and metabolic pathways. Functional categorization of the candidate genes for each phenotype (presence or absence of multifocal and diffuse lesions) was performed using the Bioconductor GOseq package. We identi ed 7 GOs and 1 metabolic pathway signi cantly enriched in the animals with multifocal and diffuse lesions, respectively. As showed in Table 4

Discussion
PTB is a multifactorial disease that arises as a result of the interaction of genetic, environmental, and microbial factors leading to the various PTB outcomes. Genetic factors are likely to play an important role in PTB pathogenesis. Cattle infected with MAP display various types of lesions with distinct severity but the associations between host genetic and PTB-associated pathology had not been explored before. Although previous GWAS identi ed loci associated with MAP tissue infection assessed by PCR and culture [26][27], our study provides the rst comparison of the genetic effects associated with three distinct PTB-associated lesions (focal, multifocal, and diffuse) in Spanish Holstein cattle (N = 813). Our study is unique in the de nition of cases and controls through the histopathological analysis of PTB-associated lesions. Using WGS data, heritability estimates were calculated for the focal (h 2 = 0.145), multifocal (h 2 = 0.075), and diffuse (h 2 = 0.189) lesions indicating that the PTB-associated pathology has a genetic component. SNPs associated with the multifocal and diffuse lesions were identi ed which suggests that the control of the PTB in ammatory response is genetically determined and consists, for the most part, of variants with moderate effect located across many chromosomes of the bovine genome. Previous studies indicated that susceptibility to MAP infection is heritable, with h 2 estimates of susceptibility to the disease ranging from < 0.01 [27][28] to 0.2843 [29]. This is comparable with the moderate heritability (h 2 = 0.12) estimated for bTb infection even though the phenotype de nitions and models used differed [21]. In a previous study, Wilkinson et al, genotyped 1966 Holstein-Friesian dairy cows following M. bovis infection with three distinct phenotypes: single intradermal cervical comparative tuberculin (SICCT) test positives with visible lesions, SICCT-positives with undetected visible lesions, and SICCT-negative on multiple tests [30]. Regardless of the case phenotype, chromosome hereditability estimates did not exceed h 2 = 0.08. In the current study, the h 2 estimates were calculated for the presence or absence of PTB-associated histopathological lesions in Spanish Holstein cattle (N = 813). When contrasting these two PTB outcomes, we couldn´t identify any SNP surpassing the FDR < 0.05. However, the strati cation of PTB-associated pathology in three categories allowed the identi cation of SNPs surpassing the FDR < 0.05 and increased the h 2 of the diffuse lesions (h 2 = 0.189) when compared with phenotypes such as the ELISA-tissue PCR-tissue culture (h 2 = 0.139) that typically detect animals with diffuse lesions [15].
The GWAS did not identify any SNP associated with the focal lesions. This could be attributed to the lack of discrimination between no visible and focal lesions. Using RNA-Seq technology we previously identi ed a biomarker of PTB progression, the precursor of the bovine intelectin 2 (ITLN2), which was overexpressed in ICV samples of animals with focal (log 2 fold-change = 10.6) and diffuse (log 2 foldchange = 6.8) lesions compared with animals without visible lesions [31]. More recently, we have demonstrated that the quanti cation of bovine ITLN2 secreting cells by immunohistochemical analysis of ICV sections could constitute a good post-mortem tool, complementary to histopathology, to improve the detection of focal lesions which may sometimes be di cult to detect [32]. In an attempt to test whether the focal lesions were controlled by a speci c genetic, a GWAS comparing animals with focal vs multifocal and focal vs diffuse lesions was performed (data not shown). A total of 28 and 590 SNPs were speci cally associated with each comparison (FDR < 0.05), respectively. These results con rmed signi cant differences in the genetic predisposition to develop focal lesions. Further research with diagnostic methods allowing better discrimination of the animals with focal lesions is needed to con rm these results.
A total of 192 and 92 SNPs de ning 13 and 9 distinct quantitative trait loci (QTLs) were highly associated with the multifocal and diffuse lesions, respectively. All these SNPs had a positive b-value and OR > 1 and were, therefore, associated with the susceptibility to each pathological outcome. No overlap was seen in the SNPs associated with each type of lesion which suggests that distinct genetics control the multifocal and diffuse lesions. Altogether our ndings suggested distinct host predisposition to develop multifocal or diffuse lesions which imply that these lesions represent divergent disease outcomes. In fact, pathway analysis with candidate genes overlapping the lesions-associated QTLs revealed a signi cant enrichment of variants controlling the intermediate lament cytoskeleton and cholesterol metabolism in the animals with multifocal and diffuse lesions, respectively. Our ndings provide for the rst time a potential link between genetic variants in KRT genes (KRT5, KRT7, KRT72, KRT73, KRT74, KRT75, KRT80, KRT81, and KRT83) and PTB. Interestingly, abnormal keratin mutations have been associated with IBD [33]. It was previously reported the sharing of mimicking B cell epitopes between M. leprae and the host KRT7 [34]. Similarly, we hypothesized that molecular mimicry between putative epitopes of KRT and MAP might be playing a role in the development of PTB-associated multifocal lesions in genetically predisposed individuals. However, no correlation between high levels of AKA in plasma and cows with multifocal lesions was observed. On the other hand, KRTs, the major subgroup among the intermediate lament family of cytoskeletal proteins, are responsible for maintaining the stability and integrity of the gastrointestinal epithelium, for providing tissue resilience against many stimuli, and regulating various cellular functions such as cellular proliferation, migration, differentiation as well as in ammatory and immune responses [35]. In a granulomatous experimental model using the teleost Piaractus mesopotamicus infected with BCG, stronger immunostaining with anti-cytokeratin antibodies was observed at 33 days p.i. when epithelioid cells were more evident and the granulomas were fully formed [36]. In tuberculoid (TT) and borderline tuberculoid leprosy, epithelioid non-caseinated granulomas encapsulated by many lymphocytes predominate and acid-fast bacilli are absent or only rarely present [37]. The presence of epithelioid granulomas with multifocal distribution in TT leads to the control of M.leprae replication and the containment of its spread [38]. Further immunohistochemical studies with anti-cytokeratin antibodies are needed to quantify the number of epithelioid cells in PTB-associated multifocal lesions and to reveal the potential role of KRT in maintaining tissue resilience.
While KRT variants may predispose cows to the development of multifocal lesions, variants in candidate genes involved in the cholesterol metabolism were enriched in animals with diffuse lesions, thereby suggesting that cholesterol variants are associated with disease progression. To validate whether the enrichment of SNP variants in candidate genes involved in the cholesterol metabolism was associated with the diffuse lesions (NCEH1/LDLR/ANGPTL8/ANGPTL4), the levels of total cholesterol were measured in plasma samples of cattle with focal, multifocal, or diffuse lesions or with no visible lesions. Our results showed reduced levels of plasma cholesterol in cattle with diffuse lesions (P ≤ 0.001). Similarly, in ammation has been associated with decreased total serum cholesterol levels in patients diagnosed with CD [39] and IBD [40]. This reduction might be due to impedance of absorption of cholesterol e ciently through the thickened intestinal wall in individuals in advances stages of the infection. Using RNA-Seq, we previously detected that the cholesterol route (bta04977) was dysregulated in the ICV of cows with diffuse lesions versus the control group with four upregulated genes matching this route (APOA1, APOC3, APOA4, APOB) [31]. In addition, the top upregulated gene in peripheral blood of animals with focal lesions versus control cows was the cassette subfamily A member 13 (ABCA13), a gene that accelerates cholesterol internalization and accumulation in intracellular vesicles [31][32]41]. In agreement with our data, recent evidence suggests that MAP-infected macrophages accumulate intracellular cholesterol droplets and depict a foam cell phenotype providing an enriched environment for MAP survival [42][43]. Altogether, these ndings suggest increased cholesterol transport, internalization and hydrolysis in macrophages of animals with diffuse lesions, which may also invoke a localized compensatory mechanism to increase cholesterol synthesis at the site of the infection.

Conclusions
While PTB-associated multifocal lesions lead to localized disease, the diffuse lesions present a disseminated form with high bacterial loads. Our results demonstrated that the host predisposition to developing multifocal and diffuse lesions is polygenic with a large number of genetic variants having a moderate effect on the regulation of each observed phenotype. The genetic architecture underlying these two case phenotypes appears to be completely distinct and chromosomes harbor variants that distinguish these two disease outcomes. Manhattan plots showing -log10(P-values) of association between every single SNP and phenotype.
Each dot represents the result from the test association for a single SNP. Animals were considered cases when they had focal (Fig. 1A), multifocal (Fig 1B) or diffuse (Fig 1 C) lesions. Chromosomes localization of the SNPs associated with each type of lesion is indicated on the x-axis. The red horizontal line is drawn at -log10 (5 × 10-7) to show the high level of signi cance. respectively. P-values were calculated using an unpaired t-test. *P value < 0.001.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.