Look Over the Horizon - Chicken Linkage Disequilibrium is Much More Complex Over Much Longer Distance than Previously Appreciated


 BackgroundAppreciable Linkage Disequilibrium (LD) is commonly found between pairs of loci close to one another, decreasing rapidly with distance between the loci. This provides the basis studies to map Quantitative Trait Loci Regions (QTLRs), where it is custom to assume that the closest sites to a significant markers are the prime candidate to be the causative mutation. Nevertheless, Long-Range LD (LRLD) can also be found among well-separated sites. LD blocks are runs of genomic sites all having appreciable LD with one another. High LD and LRLD are often separated by genomic sites with which they have practically no LD. Thus, not only can LD be found among distant loci, but also its pattern may be complex, comprised of fragmented blocks. Here, chicken LRLD and LD blocks, and their relationship with previously described Marek’s Disease (MD) QTLRs, were studied in an F6 population from a full-sib advanced intercross line, and in eight commercial pure layer lines. Genome wide LRLD was studied in the F6 population by random samples of non-syntenic and syntenic marker pairs. To illustrate the relationship with QTLRs, LRLD and LD blocks in and between the MD QTLRs were studied by all possible marker pairs.Results LRLD was defined as r2 ≥ 0.7 over a distance ≥ 1 Mb, and 1.5% of all syntenic marker pairs were classified as LRLD. Complex fragmented and interdigitated LD blocks were found, ranging over distances from a few hundred to a few millions bases. Vast high, long-range, and complex LD was found between two of the MD QTLRs. Cross QTLRs STRING networks and gene interactions suggested possible origins of the exceptional LD between these two QTLRs.ConclusionsAll sites with high LD with a significant marker should be considered as candidate for the causative mutation, but, unlike the custom assumption, the causative mutation is not necessarily the one closest to the significant marker. Rather, the present results show that it can be located at a much larger distance from a significant marker than previously appreciated, beyond closer mutations. Thus, LRLD range and LD block complexity must be accounted for while interpreting genetic mapping studies.


Long-range LD (LRLD)
Estimating LRLD by random samples of syntenic marker pairs. Pooled over all families, 418,075 pairs had a distance above 20 Mb; as expected, no high LD of r 2 ≥ 0.7 was found beyond 20 Mb (Additional Table 2).
Detailed inspection of the distances up to 20 Mb showed that all high LDs were in fact within 10 Mb (Additional Table 3).
Pooled over all families together, a total of 50,100 random marker pairs quali ed within the LRLD de nition, namely r 2 ≥ 0.7 over a distance ≥ 1 Mb. These LRLDs constitute 30.9% of all pairs within 20 Mb, and 1.5% of the total number of syntenic pairs tested (Additional Table 2).
Among the syntenic pairs, 0.016 had r 2 > 0.95, almost 15-times the proportion of the single LD value in this range (0.000001) found among the non-syntenic pairs (Additional Table 1). Thus, the proportion of syntenic high LD was not negligible. LRLDs were distributed over all autosomes in all ve F 6 families ( Figure 1; Additional Table 4; LD matrices in gshare portal). No LRLD was found on Chromosomes 22 in any of the families.
Though these LRLDs were obtained by random sampling of marker pairs, repeated similar locations of marker pairs suggest the existence of many LRLD blocks. This was indeed found by the LD analysis of the MD QTLRs (see below). F 6 MD QTLRs and random LRLD. To check for a possible relationship between the LRLDs found here and the F 6 MD QTLRs mapped in the same population (Table 1), LRLDs and QTLRs were aligned together (e.g., Figure 1 and LD matrices in gshare portal), and overlaps were counted (Additional Table 5).
As noted above, with all markers in an interval less than 1 Mb, no LRLD could be found on Chr 16 (Additional Table 4); hence, QTLR 32 was not included in any further analyses. Of the remaining 37 QTLRs, overlaps between 28 QTLRs and LRLDs were found in all families (the non-zeros under 'Families' in Additional Table 5). It seems remarkable that, even though only 1.5% of the random LD values were LDLR, no less than 75.7% of the mapped MD QTLRs overlapped LRLDs. Then again, in Galgal6, QTLRs averaged 1.4 Mb (Table 1), and random LRLDs averaged 2.2 Mb, from 1 to above 12 Mb (Additional Table 6). Thus, such overlap may not be so surprising, but a result of the abundance and size of the QTLRs and LRLDs.
Zooming in on QTLRs clearly showed the overlap between the LRLDs and QTLRs ( Figure 2). Not only was LRLD found within QTLRs, but LRLD was found between QTLRs 4 and 5 in all 5 families. The similar locations seen in Figure 2 suggest the presence of LD blocks shared by both QTLRs.

LD in the QTLRs in the F 6 families
The overlaps found in the F 6 families between random LRLDs and the MD QTLRs, led us to examine in more detail the LRLD and LD blocks in these QTLRs, with all informative markers of the ve F 6 families (note that this part used all pairs of informative markers in the QTLRs, and not only a sample of random pairs as in the rst LD analysis).
Chromosomes 1, 2, 4, 5, 6 and 14, harbored more than one QTLR (Table 1), thus enabling examination of LD in and between QTLRs. In each F 6 family, Affymetrix SNP array genotypes were used to calculate LD between all possible pairs of all markers in the 21 QTLRs on those chromosomes.
Hundreds of thousands of LRLDs were found in and between the tested QTLRs (Additional Table 7). Total number of marker pairs ranged from below 8 to above 10 million in a family, to a total of more than 43 million pairs. Of these, pooled over all families, 830,182 were LRLDs (62,103 -227,015 LRLDs in a family). These constitute 0.7 -2.6% of all pairs in a family, a total of 1.9%, higher than the 1.5% found among the random pairs over all autosomes (Additional Table 3).
Family 5 is an outlier in Additional Table 7, with a much lower number and proportion of total LRLDs and LRLDs across QTLRs compared to the other four families. Further inspection did not identify any source of this difference. Hence, we have no explanation other than sampling variation.
Pooling all families together, LRLDs were found in all 6 chromosomes examined (Table 5). No LRLD could be found in the QTLRs on Chromosomes 5 or 6 (Table 5), as no QTLR there was larger than 1 Mb (Table 1). However, LRLDs between QTLRs were also found in those two chromosomes.
In all families, LRLDs were found between most pairs of QTLRs (Additional Table 8 a-f). Exceptional among all pairs of QTLRs, an extremely large number of LRLDs (159,413) was found between QTLRs 4 and 5 on Chr 1 in all families, con rming the results of the random samples ( Figure 2). The tight LD between these two QTLRs was further con rmed by the LD blocks (below).
Thus, repeating in all F 6 families, LRLDs were found to be frequent, distributing within and between QTLRs in all chromosomes tested.
QTLR LD blocks LD Blocks in the F 6 QTLR As shown by the data a complicated LD pattern was found in the F 6 QTLR. Large, fragmented, and interdigitated LD blocks were found in all ve families over all six chromosomes examined (LD matrices in gshare portal). The range and complexity would have been even larger if moderate LD blocks were included, with 0.15 ≤ r 2 < 0.7.
An example of fragmented interdigitated blocks is presented in Figure 3a. Close examination of the LD found in Family 1 in this region shows the presence of 3 high LD blocks, all fragmented and all interdigitated with one another: Block 1 includes markers with ID numbers 134-141, 143, 145-149, and 151; Block 2 includes markers 142, 144 and 152; Block 3 includes markers 150 and 337. The fact that, despite their apparent fragmentation, these are indeed genuine blocks is shown in Figures   3 b-d. If the markers in Blocks 2 and 3 were not included in the analysis, (e.g., because they were not on the SNP array or were ltered out by the quality control or were not polymorphic in this family), then three clear unambiguous blocks would have been identi ed.
Note the distance between the markers in block 3, is above 0.5 Mb. Should the criterion of 0.25 Mb [6] been used, this block would be de ned as LRLD.
Blocks shared by QTLRs 4 and 5 in the F 6 families. In accordance with the random sampling of marker pairs and LRLDs in and between QTLRs in the F 6 , large and long-range LD blocks were shared by QTLRs 4 and 5 in all ve families, as exempli ed in Additional Figure 1 and detailed in Additional Tables 8 a-f. In Additional Figure 1, the high LD block distributed from the rst marker of QTLR 4 to close to the end of QTLR 5, over 5.7 Mb, with 412 markers included. Considering moderate LD of 0.15 ≤ r 2 < 0.7, would stretch the block all the way to the end of QTLR 5, over more than 7.1 Mb. Thus, the exceptional LD between QTLRs 4 and 5 indicated by the random sample of pairs was con rmed in all F 6 families by both LRLDs and LD blocks between QTLRs.
LD among QTLR elements in the eight pure lines LD of elements within and between the F 6 QTLRs was further examined within eight Hy-Line elite pure lines. Complex LD blocks between elements within and across QTLRs were found, similar to that found in the F 6 families, over distances from a few bp to a few Mb (Figure 4-6; all LD matrices are in gshare portal).
LD within one QTLR gene. Figure 4 present an example of LD blocks within the QTLR gene TRANK1 in Line WL1. Despite the short distances (390 bp to 14.5 Kb), a complex pattern was found, with 2 LD blocks, one of which is fragmented around the other. There was high to complete LD between markers 5, 8 and 13-36. These markers had practically no LD with markers 11 and 12, which were in complete LD with one another. Thus, in the gene TRANK1 in Line WL1, Block 1 starts before, but ends after Block 2. The association test P values [14] completely matched the LD blocks, with the same or close P values in each block. This match was found in all other combinations of QTLR -line ( Figures 5 -7).
LD between QTLR elements. An example of a more complex LD pattern with interdigitated blocks is shown in Figure 5, this time across QTLR elements (3 lncRNAs).
Careful inspection of Figure 5  LD was found between other types of QTLR elements as well. Figure 6 present such LD between the QTLR genes TLR4 and BRINP1 in QTLR 33 on Chr 17. The rst marker of TLR4 has high LD to the rst 2 markers of BRINP1, and the 3 markers are not linked to other markers of their own gene. The other 6 markers of TLR4 form a tight LD block. Complexing it even further, the last marker of BRINP1 (Marker 15) had low to moderate LD with all markers in QTLR 33, both genes included.
LD between QTLRs 4 and 5. Markers on both QTLRs 4 and 5 were informative only in Lines WL3, WPR1, WPR2, and RIR1, up to only 4 markers in a line in QTLR 4 (Lines' LD matrices in gshare portal). Thus, information on the LD between the QTLRs was limited in this dataset. Nevertheless, in accord with the random LDs in the F 6 families (Figures 1 and 2) and cross QTLRs LRLD in these families (Additional Tables 8 b-f), moderate LD blocks among elements in these QTLRs crossed their boundaries in Lines WPR1, WPR2 and RIR1. In Line WRP1, 2 clear high LD blocks were found, one in each QTLR ( Figure 7). However, the 2 QTLR blocks had moderate LDs of r 2 = 0.478 among them, thus forming one moderate LD block. Note that the distances between the cross QTLR pairs, varied from 5.135 to 5.138 Mb.
Looking for a source of such vast, high, and complex long-range LD between QTLRs 4 and 5, a bioinformatics search found 10 and 68 genes in QTLRs 4 and 5, respectively ( Figure 8 and Table 6). STRING network analysis revealed ve networks of 2 to 28 genes. Two of the networks ('Net' 2 and 3 in Table 6), are comprised of genes from both QTLRs ( Figure 8 and Table 6). Of 'Net' 2, the 2 genes in QTLR 4 and 17 of the 26 genes in QTLR 5 are located in the LD blocks extending over the two QTLRs found in F 6 ('+' in the column 'B4-5' in Table 6). Both genes in 'Net' 3 are in those blocks. Finally, the two networks with genes from both QTLRs included 6 genes interacting with a gene from another QTLR (Figure 8), all of which located in the cross QTLRs LD blocks. The gene networks and interactions shared by both QTLRs could be the origin of the LD between QTLRs 4 and 5. In fact, the phenomenon of genes whose products work together tending to be on the same chromosomal region is quite common. For example, the Major Histocompatibility Complex (MHC) on chicken chromosome 16 and the Regulators of Complement Activation cluster (RCA) on chromosome 26 [19,20,11]. In fact, the networks presented in Figure 8 is a good examples for this colocation of genes working together.

Discussion
Chicken LD over a range of distances, and patterns of LD blocks, were studied in ve F 6 families from a Full Sib Advanced Intercross Line (FSAIL), and eight commercial pure layer lines, thus allowing to study the repetition of the results. LRLD was studied in the F 6 population by random non-syntenic and syntenic samples of marker pairs genotyped by the Affymetrix HD SNP array. In face of the LRLD results, and to illustrate the importance of LRLD to QTL mapping results, LRLD and LD blocks were studied with all possible marker pairs of all markers in previously described MD QTLRs [14].
This study started with SNP location information from the previous chicken genome build Galgal4, and was subsequently updated to the Galgal6 assembly. This change necessitated remapping of the QTLRs described in Smith et al. (2020) [14], resulting in negligible changes of most QTLR coordinates. Nevertheless, the change of genome versions moved a segment of Galgal4 QTLR 1 (Chr 1) to Galgal6 QTLR 4 (Chr 1) (Appendix). The moved segment included one QTLR lncRNA described in Smith et al. (2020) [14], thus emphasizing the importance of basing genomic analyses on the most updated genome version. These results also present the power of LD to identify mapping errors, as already noted by Utsunomiya et al. (2016) [10].
Long-range LD (LRLD) was de ned as r 2 ≥ 0.7 over a distance ≥ 1 Mb. These criteria are more restricted than previously used [6,22]. Nevertheless, repeated appreciable numbers of LRLDs within chromosomes were found repeated in all ve F 6 families by the random sampling of syntenic marker pairs, far above the numbers of high LD found between non-syntenic markers from different chromosomes. The LRLDs were further found in all ve F 6 families by all QTLR array markers on chromosomes with more than one QTLR. These results could be an underestimate, as the F 6 population was designed to fragment the genome for high-resolution QTL mapping [23,24].
High LD blocks were de ned as a group of markers located on the same chromosome, having r 2 ≥ 0.7 with each other, even if markers with low LD appeared between them. This de nition allowed "a look over the horizon" and identi cation of complex blocks. The phenomenon of fragmented and interdigitated LD blocks were repeatedly found in all ve F 6 families and in all eight pure lines over a vast range of distances, from hundreds of bp to mega bases. The FSAIL population was composed of ve families, and they showed similar results. Strength of this analysis was that it repeated ve times. Then the same phenomenon was seen in the eight elite lines, further adding strength to the validity of these results, which also agree with previous studies from us and others [e.g., 3,11,12,13].
A strong linkage was found between QTLRs 4 and 5. LRLD between them was found while analyzing the F 6 random samples of SNPs within all autosomes. High LD blocks were found in all ve F 6 families, comprised of markers from both QTLRs.
Moderate LD blocks between QTLRs 4 and 5 were also found in 3 of the 8 pure lines by QTLR elements' markers. These results raise the question of what elements on both QTLRs are in high LD over such large distances. Of course, being MD QTLRs, the LD between the QTLRs could be a result of a co-selection for MD resistance. But what make these two QTLRs so different from all other pairs of QTLRs? Why is their LD so exceptional?
To answer this, we looked at the gene content of QTLRs 4 and 5. Ten and 68 genes were found in QTLRs 4 and 5, respectively. STRING protein network analysis revealed ve gene networks. Two of the networks include genes from both QTLRs, most of which are located within the LD blocks extending over the two QTLRs found in the F 6 . All 6 genes interacting with a gene from another QTLR are in the cross QTLRs LD blocks. Obviously, the shared gene networks and interactions could be the origin of the LD between QTLRs 4 and 5. However, assessing the uniqueness of the LD between the QTLRs necessitates further study on the distribution of cross QTLR networks and interactions among other pairs of QTLRs with less LD among them. Furthermore, assessing the real effect of the gene networks and interactions needs more molecular, quantitative and population studies. All of these are beyond the scope of the present study.
In general, the complex LD found in this study could stem from technical reasons such as sampling variation. It can also be a result of mapping errors, as was indeed found in this study for regions on chromosome 1 with build 4 (Appendix). However, it could have genuine biological meaning, through processes such as co-evolution of genomic sites (as a result natural or arti cial selection), gene conversion, copy number variation, population bottlenecks, non-random mating, and epistasis. The repeatability in different analyses, different datasets and different populations, and the agreement with previous reports strengthen the case for the present results as being a genuine biological phenomenon.

Conclusions
The observed LRLD and fragmented interdigitated LD blocks imply that the causative element is not necessarily the closest, or even close at all to the signi cant marker, and maybe not even to the signi cant block of markers. Thus, mapping results and searches for causative elements must consider the complexity of the LD.

Populations
Nine populations described by Smith et al. (2020) [14] were again used in the present [14] study. These comprised ve families of F 6 birds from an FSAIL used by Smith et al. (2020) [14] to map QTLRs affecting MD resistance, and eight pure lines used in the same study to test these QTLRs.
A priory, it is expected that LRLD in F 6 will be at higher frequency than in pure lines, because the families start with only 4 chromosomes each, and there are only a few generation of intercrossing to break up the haplotype blocks. However, this study did not compare populations, families or lines. Rather, it aim to present the phenomena of LD long range and complexity.

Trait
The trait for which these QTLR are associated is resistance to the avian oncogenic alpha herpes virus, Marek's Disease (MD virus [14]. This trait association data set was used, as the phenotype and genotype information was available. It is used as an illustration for the QTLR and LD associations that were identi ed.

Genotypes
Only the autosomal genotypes as used by Smith et al. (2020) [14] to map and test QTLRs were used in the present study. Genotypes on the Z chromosome will be analyzed in detail in a different manuscript. Genotypes were obtained from the HD 600K Affymetrix SNP chicken array [25] in the F 6 population, and marker genotypes used to test the QTLRs by the eight pure lines [14]. The only difference was that instead of a minimum MAF ≥ 0.01 used for the association tests by Smith et al. (2020) [14], a threshold of 0.10 was used here for the LD analysis, to avoid spurious high LD due to rare alleles [7]. However, to test LD with exactly the same markers used for association tests in the eight pure lines, the threshold of MAF ≥ 0.01 was also used for LD for these lines.

Genome assemblies and remapping QTLRs
As described in Smith et al. (2020) [14], the initial analysis in this study was based on the Galgal4 genome build. The Lift Genome Annotations tool [26] within the UCSC Browser was used to remap markers from Galgal4 to Galgal6 (GRCg6a; Acc. No.: GCA_000002315.5). The new coordinates were then used to remap the F 6 QTLRs as was done by Smith et al. (2020) [14].
The results were very similar on both assemblies, and hence only the results from Galgal6 will be presented here. Nevertheless, there was one change worth noting, detailed in the Appendix. Non-syntenic LD. Background LD over all autosomes was estimated utilizing Affymetrix 600K genotypes of two random combined samples with return of non-syntenic marker pairs from each of the ve F 6 families.
Long-Range LD (LRLD). Koch et al. (2013) [6] used all pairs of SNPs on each human chromosome and de ned LRLD between haplotype-blocks rather than between SNP pairs. However, due to computer limitations, we used samples of random SNP pairs on the Affymetrix 600K SNP array to assess LRLD in the autosomes of the ve F 6 families. Koch  present results, in this study we de ned LRLD as a marker pair with r 2 ≥ 0.7 over a distance ≥ 1.0 Mb. As noted in the Introduction, LRLD between elements in a QTLR and across QTLRs can indicate a relationship between the element and between the QTLRs. Hence, to illustrate its importance, LRLD in and between MD QTLRs [14] was studied by all F 6 Affymetrix genotypes in the QTLRs. F 6 random LRLD and QTLRs. F 6 LRLD of random marker pairs were aligned by chromosomal location with the F 6 MD QTLRs [14], and all overlaps between the LRLDs and the QTLRs were counted. LD blocks. High LD blocks were de ned as a group of markers located on the same chromosome having r 2 ≥ 0.7 with each other. The de nition was applied even if markers with low LD appeared between the LD markers. This de nition allowed a "look over the horizon" and identi cation of fragmented and interdigitated blocks.

Gene network analysis
To investigate the genes underlying QTLRs 4 and 5, the BioMart tool within the Ensembl database (https://www.ensembl.org/info/data/biomart/index.html) was used to identify genes in these regions. These identi ed genes were then subject to network analysis using the STRING database (v11) [27], which provides an overview of known protein interactions.

Competing interests
The authors declare that they have no competing interests.     All, all families combine; r 2 ≥ (last 2 rows), frequencies of r 2 above the indicated value.   Figure 8, given by order of location of the rst gene (not by order of appearance in Figure 8); Net bolded, net comprise of genes from both QTLRs; B4-5, location in high LD blocks extending over the two QTLRs in the ve F 6 families. Figure 1 Distribution of random LRLDs over Chr 1 and overlaps with QTLRs (Table 1) in F6 Family 1. Location of marker pairs plotted against r2. Start, End, locations of the markers in a pair (for each Start dot there is a matched End dot; see Figure 2 for clarity); numbers, QTLR numbers (Table 1).

Figure 2
Overlaps between QTLRs 4-5 and random LRLD in all F6 families. Fam, family. There were 5 LRLDs in Family 1, 1 LRLD in Families 2 -4, and 2 LRLDs in Family 5. Each series of an LRLD is composed of two circles connected by a single line; the circles represent the locations on the x-axis of 2 random markers constitute a pair, and the LD r2 of this pair is presented on the y-axis. There could be more than one LRLD in a family. Similarly, the series of QTLRs 4 and 5 are presents by squares connected by a double line; the squares represent the locations of the QTLR boundaries. QTLRs do not have an r2 of course; they are simply presented on the x-axis, aligned with the LRLD.
Page 20/23    LD blocks across QTLR elements. Line WL3, QTLR 3 on chromosome 1. B, block serial number ordered by the location of the rst marker (same LD block have the same color); Mb, location on Galgal6 in Mb; Dist., distance in Mb; E, QTLR element [14]; Red, r2 ≥ 0.7; similar P values has similar colors. LD between QTLR genes. Line WL2, QTLR 33 on chromosome 17. B, block serial number ordered by the location of the rst marker (same LD block have the same color); Mb, location on Galgal6 in Mb; Dist., distance in Mb; E, QTLR element [14]; Red, r2 ≥ 0.7; purple, 0.15 ≤ r2 < 0.7; same LD block have the same color; similar P values has similar colors. LD between QTLRs 4 and 5. Line WPR1, the lncRNA01 in QTLR 4 and the gene CSTA in QTLR 5. Q, QTLR serial number (Table  1); B, high LD block serial number ordered by the location of the rst marker (same LD block have the same color); Mb, location on Galgal6 in Mb; Dist., distance in Mb; E, QTLR element [14]; Red, r2 ≥ 0.7; purple, 0.15 ≤ r2 < 0.7; same LD block have the same color; similar P values has similar colors.