Remapping QTLRs from Galgal4 to Galgal6. The new coordinates of the markers on Galgal6 and association results obtained by Smith et al. (2020) [14] were used to remap the MD QTLRs identified with Galgal4 by Smith et al. (2020) [14]. The same 38 QTLRs were found on each genome build (Table 1). Most changes were negligible, except one movement of a fragment on Chr 1 over about 70 Mb from QTLR 1 to QTLR 4, including a QTLR lncRNA tested by Smith et al. (2020) [14]. This change is detailed in the Appendix. The new QTLR coordinates on Galgal6 were used for the LD analyses in this study.
Linkage disequilibrium in the F6 population based on 600K genotyping SNP array
Non-syntenic random LD. A total of 923,183 random non-syntenic pairs of markers from different autosomes were used to assess the background level of LD, with an average of 184,636.6 pairs in a family (Table 2). LD averaged 0.011 ± 0.016, comparable to previous reports in chicken [12, 15,], but about ten times higher than values reported in mammals; horse [5), sheep [4], and cattle [16, 17] populations. These differences may represent experimental design, or population sample, size, history and structure or biological differences between birds and mammals.
Mean family LDs and Standard Deviations (SDs) had significant high negative correlation with the size of the population sample (i.e., Ind/pair), r = -0.895 (P = 0.040) and r = -0.884 (P = 0.046). These correlations are also in accord with our previous report in chicken [12], and the expectation of Sved (1971) [18].
With a mean LD of 0.011 ± 0.016, any r2 > 0.043 is above the background LD. Indeed, combining all five F6 families together, only 3.5% of the r2 values were above 0.05 (Additional Table 1). A single high LD of r2 = 0.991 was found in Family 2. Without any replication, this was treated as a sampling effect. Based on these results, a conservative critical LD value of r2 ≥ 0.15 was chosen for defining significant LD.
Syntenic random LD. A total of 1,008,823 random syntenic marker pairs were used to assess the level of random LDs on the same chromosome (Table 3). Distance between markers in the random pairs varied from 11 to 197,038,449 bp, with an average of 28,976,195.6 bp. Means of r2 were all in close range around 0.11, averaging 0.114, ten times the means obtained for the non-syntenic LD. Though obtained by random marker pairs, some of which are at a long distance from one another, these means suggest the presence of large number of LDs above the background LD of 0.011 (Table 2). The expected negative correlation between distance and LD was again obtained in all five families (Table 3).
In all families, about two-thirds of the LD values were up to 0.05, dropping rapidly thereafter (Table 4). Interesting, for all families, there was an increase in the range of r2 > 0.85, suggesting existence of large high-LD blocks. Pooled over all families, the proportion of r2 ≥ 0.15, set conservatively as a threshold of significance by the non-syntenic LD, was almost 0.2 (Table 4), while less than 5% of the LD values were above 0.7. Hence, the range of 0.15 ≤ r2 < 0.7 was set as low to moderate LD and used to define moderate LD blocks, and r2 ≥ 0.7 was set as high LD and used to define LRLD and high LD blocks.
Long-range LD (LRLD)
Estimating LRLD by random samples of syntenic marker pairs. Pooled over all families, 418,075 pairs had a distance above 20 Mb; as expected, no high LD of r2 ≥ 0.7 was found beyond 20 Mb (Additional Table 2).
Detailed inspection of the distances up to 20 Mb showed that all high LDs were in fact within 10 Mb (Additional Table 3).
Pooled over all families together, a total of 50,100 random marker pairs qualified within the LRLD definition, namely r2 ≥ 0.7 over a distance ≥ 1 Mb. These LRLDs constitute 30.9% of all pairs within 20 Mb, and 1.5% of the total number of syntenic pairs tested (Additional Table 2).
Among the syntenic pairs, 0.016 had r2 > 0.95, almost 15-times the proportion of the single LD value in this range (0.000001) found among the non-syntenic pairs (Additional Table 1). Thus, the proportion of syntenic high LD was not negligible. LRLDs were distributed over all autosomes in all five F6 families (Figure 1; Additional Table 4; LD matrices in figshare portal). No LRLD was found on Chromosomes 22 in any of the families.
Though these LRLDs were obtained by random sampling of marker pairs, repeated similar locations of marker pairs suggest the existence of many LRLD blocks. This was indeed found by the LD analysis of the MD QTLRs (see below).
F6 MD QTLRs and random LRLD. To check for a possible relationship between the LRLDs found here and the F6 MD QTLRs mapped in the same population (Table 1), LRLDs and QTLRs were aligned together (e.g., Figure 1 and LD matrices in figshare portal), and overlaps were counted (Additional Table 5).
As noted above, with all markers in an interval less than 1 Mb, no LRLD could be found on Chr 16 (Additional Table 4); hence, QTLR 32 was not included in any further analyses. Of the remaining 37 QTLRs, overlaps between 28 QTLRs and LRLDs were found in all families (the non-zeros under ‘Families’ in Additional Table 5). It seems remarkable that, even though only 1.5% of the random LD values were LDLR, no less than 75.7% of the mapped MD QTLRs overlapped LRLDs. Then again, in Galgal6, QTLRs averaged 1.4 Mb (Table 1), and random LRLDs averaged 2.2 Mb, from 1 to above 12 Mb (Additional Table 6). Thus, such overlap may not be so surprising, but a result of the abundance and size of the QTLRs and LRLDs.
Zooming in on QTLRs clearly showed the overlap between the LRLDs and QTLRs (Figure 2). Not only was LRLD found within QTLRs, but LRLD was found between QTLRs 4 and 5 in all 5 families. The similar locations seen in Figure 2 suggest the presence of LD blocks shared by both QTLRs.
LD in the QTLRs in the F6 families
The overlaps found in the F6 families between random LRLDs and the MD QTLRs, led us to examine in more detail the LRLD and LD blocks in these QTLRs, with all informative markers of the five F6 families (note that this part used all pairs of informative markers in the QTLRs, and not only a sample of random pairs as in the first LD analysis).
Chromosomes 1, 2, 4, 5, 6 and 14, harbored more than one QTLR (Table 1), thus enabling examination of LD in and between QTLRs. In each F6 family, Affymetrix SNP array genotypes were used to calculate LD between all possible pairs of all markers in the 21 QTLRs on those chromosomes.
Hundreds of thousands of LRLDs were found in and between the tested QTLRs (Additional Table 7). Total number of marker pairs ranged from below 8 to above 10 million in a family, to a total of more than 43 million pairs. Of these, pooled over all families, 830,182 were LRLDs (62,103 - 227,015 LRLDs in a family). These constitute 0.7 - 2.6% of all pairs in a family, a total of 1.9%, higher than the 1.5% found among the random pairs over all autosomes (Additional Table 3).
A total of 161,832 LRLDs were found between QTLRs (Additional Table 7), 19.5% of all LRLDs found (0.6 - 24.9 % among the families).
Family 5 is an outlier in Additional Table 7, with a much lower number and proportion of total LRLDs and LRLDs across QTLRs compared to the other four families. Further inspection did not identify any source of this difference. Hence, we have no explanation other than sampling variation.
Pooling all families together, LRLDs were found in all 6 chromosomes examined (Table 5). No LRLD could be found in the QTLRs on Chromosomes 5 or 6 (Table 5), as no QTLR there was larger than 1 Mb (Table 1). However, LRLDs between QTLRs were also found in those two chromosomes.
In all families, LRLDs were found between most pairs of QTLRs (Additional Table 8 a-f). Exceptional among all pairs of QTLRs, an extremely large number of LRLDs (159,413) was found between QTLRs 4 and 5 on Chr 1 in all families, confirming the results of the random samples (Figure 2). The tight LD between these two QTLRs was further confirmed by the LD blocks (below).
Thus, repeating in all F6 families, LRLDs were found to be frequent, distributing within and between QTLRs in all chromosomes tested.
QTLR LD blocks
LD Blocks in the F6 QTLR
As shown by the data a complicated LD pattern was found in the F6 QTLR. Large, fragmented, and interdigitated LD blocks were found in all five families over all six chromosomes examined (LD matrices in figshare portal). The range and complexity would have been even larger if moderate LD blocks were included, with 0.15 ≤ r2 < 0.7.
An example of fragmented interdigitated blocks is presented in Figure 3a. Close examination of the LD found in Family 1 in this region shows the presence of 3 high LD blocks, all fragmented and all interdigitated with one another: Block 1 includes markers with ID numbers 134-141, 143, 145-149, and 151; Block 2 includes markers 142, 144 and 152; Block 3 includes markers 150 and 337. The fact that, despite their apparent fragmentation, these are indeed genuine blocks is shown in Figures 3 b-d. If the markers in Blocks 2 and 3 were not included in the analysis, (e.g., because they were not on the SNP array or were filtered out by the quality control or were not polymorphic in this family), then three clear unambiguous blocks would have been identified.
Note the distance between the markers in block 3, is above 0.5 Mb. Should the criterion of 0.25 Mb [6] been used, this block would be defined as LRLD.
Blocks shared by QTLRs 4 and 5 in the F6 families. In accordance with the random sampling of marker pairs and LRLDs in and between QTLRs in the F6, large and long-range LD blocks were shared by QTLRs 4 and 5 in all five families, as exemplified in Additional Figure 1 and detailed in Additional Tables 8 a-f. In Additional Figure 1, the high LD block distributed from the first marker of QTLR 4 to close to the end of QTLR 5, over 5.7 Mb, with 412 markers included. Considering moderate LD of 0.15 ≤ r2 < 0.7, would stretch the block all the way to the end of QTLR 5, over more than 7.1 Mb. Thus, the exceptional LD between QTLRs 4 and 5 indicated by the random sample of pairs was confirmed in all F6 families by both LRLDs and LD blocks between QTLRs.
LD among QTLR elements in the eight pure lines
LD of elements within and between the F6 QTLRs was further examined within eight Hy-Line elite pure lines. Complex LD blocks between elements within and across QTLRs were found, similar to that found in the F6 families, over distances from a few bp to a few Mb (Figure 4-6; all LD matrices are in figshare portal).
LD within one QTLR gene. Figure 4 present an example of LD blocks within the QTLR gene TRANK1 in Line WL1. Despite the short distances (390 bp to 14.5 Kb), a complex pattern was found, with 2 LD blocks, one of which is fragmented around the other. There was high to complete LD between markers 5, 8 and 13-36. These markers had practically no LD with markers 11 and 12, which were in complete LD with one another. Thus, in the gene TRANK1 in Line WL1, Block 1 starts before, but ends after Block 2. The association test P values [14] completely matched the LD blocks, with the same or close P values in each block. This match was found in all other combinations of QTLR - line (Figures 5 - 7).
LD between QTLR elements. An example of a more complex LD pattern with interdigitated blocks is shown in Figure 5, this time across QTLR elements (3 lncRNAs).
Careful inspection of Figure 5 shows 2 interdigitated blocks: Block 1 includes Markers 6-8, 15, and 19-20; Block 2 comprise of Markers 11-14, 16-17 and 30-33. Thus, the high LD Block extend over the 3 QTLR lncRNAs. The middle lncRNA05 is split among the 2 blocks. Some of the markers are in LD with upstream lncRNA02, while other markers of the same lncRNA05 form a block with the downstream lncRNA02. The 2 groups of lncRNA05 are interdigitated. That is, Markers 6-8 of lncRNA02 are in the same block with 2 separate regions in the next lncRNA05 - Markers 15 and then 19-20 but not with the other markers in the same lncRNA; Markers 12-14 and 19-20 of lncRNA05 are in LD with all 4 markers of lncRNA04. It would be interesting to find out what are the sources of such complex LD patterns.
LD was found between other types of QTLR elements as well. Figure 6 present such LD between the QTLR genes TLR4 and BRINP1 in QTLR 33 on Chr 17. The first marker of TLR4 has high LD to the first 2 markers of BRINP1, and the 3 markers are not linked to other markers of their own gene. The other 6 markers of TLR4 form a tight LD block. Complexing it even further, the last marker of BRINP1 (Marker 15) had low to moderate LD with all markers in QTLR 33, both genes included.
LD between QTLRs 4 and 5. Markers on both QTLRs 4 and 5 were informative only in Lines WL3, WPR1, WPR2, and RIR1, up to only 4 markers in a line in QTLR 4 (Lines' LD matrices in figshare portal). Thus, information on the LD between the QTLRs was limited in this dataset. Nevertheless, in accord with the random LDs in the F6 families (Figures 1 and 2) and cross QTLRs LRLD in these families (Additional Tables 8 b-f), moderate LD blocks among elements in these QTLRs crossed their boundaries in Lines WPR1, WPR2 and RIR1. In Line WRP1, 2 clear high LD blocks were found, one in each QTLR (Figure 7). However, the 2 QTLR blocks had moderate LDs of r2 = 0.478 among them, thus forming one moderate LD block. Note that the distances between the cross QTLR pairs, varied from 5.135 to 5.138 Mb.
Looking for a source of such vast, high, and complex long-range LD between QTLRs 4 and 5, a bioinformatics search found 10 and 68 genes in QTLRs 4 and 5, respectively (Figure 8 and Table 6). STRING network analysis revealed five networks of 2 to 28 genes. Two of the networks ('Net' 2 and 3 in Table 6), are comprised of genes from both QTLRs (Figure 8 and Table 6). Of 'Net' 2, the 2 genes in QTLR 4 and 17 of the 26 genes in QTLR 5 are located in the LD blocks extending over the two QTLRs found in F6 ('+' in the column 'B4-5' in Table 6). Both genes in 'Net' 3 are in those blocks. Finally, the two networks with genes from both QTLRs included 6 genes interacting with a gene from another QTLR (Figure 8), all of which located in the cross QTLRs LD blocks. The gene networks and interactions shared by both QTLRs could be the origin of the LD between QTLRs 4 and 5. In fact, the phenomenon of genes whose products work together tending to be on the same chromosomal region is quite common. For example, the Major Histocompatibility Complex (MHC) on chicken chromosome 16 and the Regulators of Complement Activation cluster (RCA) on chromosome 26 [19, 20, 11]. In fact, the networks presented in Figure 8 is a good examples for this colocation of genes working together.