Effects of Different Mutation Rates and Randomness on the Distribution of Linkage Disequilibrium

11 Background: Mutation has recently received much attention on its role in the evolution 12 and genetics of complex trait. The linkage disequilibrium (LD) distribution can be 13 affected by mutation as reported recently, in which the same mutation rates were 14 adopted in the transition matrix. However, effects of different types, rates and 15 randomness of mutation on LD distribution remain unexplored. 16 Results: Here, we considered in the transition matrix mutations at each locus to be of 17 different types and rates (i.e. nucleotide transition or transversion treated differently), 18 to examine how the LD distribution between two genetic loci was affected. After 19 examining consecutively factors such as effective population size, recombination and 20 selection, different mutation types and rates could further change the dynamics of LD 21 distribution. However, at the current scale of mutation rate (weak at 10 -9 -10 -8 ), mutation 22 seemed to play only a minor role, compared to recombination and selection. A simple 23 model further showed that mutation randomness increased the ruggedness of LD curves, 24 which fluctuated around the steady state. 25 Conclusions: Taken together, different mutation rates and randomness could further 26 disturb the dynamics of LD distribution. Our findings can help better understand the 27 role of mutation in molecular evolution and complex trait genetics.

transition matrix approach was proposed recently, in which the same mutation rates 48 (10 −9 ) for all nucleotides were considered []]. However, different mutation types 49 (transition or transversion) and rates could affect LD distribution. In the field of 50 computational molecular evolution, the transition matrix has also been used to calculate the evolutionary distances between molecular sequences (nucleic acids and protein), 52 and to build phylogenetic trees [18,19]. A variety of statistical models have been 53 proposed (from JC]9, K80 models, to the UNREST model), based on the assumption 54 that different nucleotides between themselves have different exchanging rates in the 55 transition matrix [18,19]. Moreover, rates, spectrum and evolution of mutation were 56 discovered to be species-specific, distributed in a wide range (10 -7 -10 -10 ), and 57 selectively maintained [20][21][22][23][24][25], which relied also on nucleotide types, and status of the 58 studied organism (health, nutrition, environment, etc.) [2] -31]. Novel insights on 59 mutation and its role in population genetics [32,33] and disease incidence (especially 60 driver mutations in cancer progression) were obtained, in large-scale cancer genomics 61 and in-depth data mining studies [34][35][36][37][38]. 62 In the present study, we examined the pattern of LD distribution assuming different 63 rates for different types of nucleotide mutations (transition or transversion), adopting 64 an approach similar to the K80 model. We found that under the assumed range of 65 mutation rates (i.e. weak scale at 10 -9 -10 -8 ), patterns and dynamics of LD distribution 66 were further affected, but in a minor scale, considering consecutively effective 67 population size, recombination, and selection. Furthermore, when randomness was 68 added into a simple model, LD distribution was discovered to fluctuate around the 69 steady state.

71
Term and definition 72 Factors such as genetic drift, recombination, selection and mutation affect the 73 distribution of LD ( 2 ) [39]. Two statistics widely used to quantify the LD at two loci 74 (diallelic system) are: types and rates of mutations into the model, and also considered the randomness of 103 mutation in a simple model.

104
For two linked loci, and , it is assumed that the mutation rate from 1 to 2 is 105 1, and the rate of reverse mutation is 1. Likewise, the mutation rate from 1 to    Then, the four haplotype probabilities following mutation can be computed as: Where , the transition matrix, according to the aforementioned four possible 125 situations, is built as follows: Furthermore, to incorporate selection in the model, a selection coefficient that 159 reduces the allele frequency of 1 at locus , which is a causal variant, is considered.

160
Conditional on the haplotype probabilities in * * , i.e. 1 1 * * , 1 2 * * , 2 1 * * ,and 2 2 * * .  Scenario I: effective population size ( ) 198 We found that different mutation rates and their combinations could affect the 199 distribution of LD. However, in the end, even the effective population sizes were 200 different, the same final levels of LD were reached (Fig. 1).

201
With no regard on , by looking at each panel in Fig. 1, we could easily see that In addition, when one mutation rate was set at 10 −8 , and the other at 10 −9 , or vice 208 versa, LD values distributed in between the above two curves (green and red lines 209 overlapped).

210
Considering effective population sizes ( , from 15 to 45 in intervals of 10), we could 211 see that the larger the , the lower the LD can get. That is, the smaller population size 212 can lead to bigger LD values (Fig. 1A). In addition, when increases, the number 213 of generations needed to reach the final steady state became bigger, from ~]00 214 generations at =15, ~1000 generations at =25, ~1200 generations at =35, to 215 ~1500 generations at =45 (Fig. 1A-D).

217
Besides effective population size, we further added recombination rate into the 218 framework, to study their combined effects on LD distribution. Recombination rate 219 seemed to affect greatly the LD distribution, and also the final levels of LD when 220 reaching the stasis state (Fig. 2). Generally, within each horizontal panel ( Fig. 2A, B   221 and C), we found the pattern that the larger the recombination rates, the smaller the LD

242
In addition, 3-dimensional view on the effects of mutation and recombination on LD 243 distribution was presented, to better understand its dynamic changes (Fig. 4). Similar changing also in a random way (Fig. ]). Generally, LD distribution could still be seen to have a similar pattern to previous results ( Fig. 2 and 5), no matter how the 283 randomness was brought into the modeling process of LD distribution. However, the 284 LD curve was not smooth any more now, but showing increasing ruggedness.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. TableS1S3.docx