A selection pressure landscape for 870 human polygenic traits

Characterizing the natural selection of complex traits is important for understanding human evolution and both biological and pathological mechanisms. We leveraged genome-wide summary statistics for 870 polygenic traits and attempted to quantify signals of selection on traits of different forms in European ancestry across four periods in human history and evolution. We found that 88% of these traits underwent polygenic change in the past 2,000–3,000 years. Recent selection was associated with ancient selection signals in the same trait. Traits related to pigmentation, body measurement and nutritional intake exhibited strong selection signals across different time scales. Our findings are limited by our use of exclusively European data and the use of genome-wide association study data, which identify associations between genetic variants and phenotypes that may not be causal. In sum, we provide an overview of signals of selection on human polygenic traits and their characteristics across human evolution, based on a European subset of human genetic diversity. These findings could serve as a foundation for further populational and medical genetic studies. Song et al. quantify the signal of natural selection on 870 complex traits in European individuals, finding that 88% of traits showed signals of selection in the past 3,000 years, including traits related to pigmentation, body shape and food intake.

T he genetic architecture of present-day humans is shaped by selection pressures in our history 1 . Understanding the patterns of natural selection in humans can provide valuable insights into the mechanisms of biological processes 2 , the origin of human psychological characteristics 3 and key anthropological events 4 . With regard to public health and clinical medicine, the study of evolution promotes knowledge of disease mechanisms and susceptibility 5,6 , and aids precision medicine by highlighting genetic variants 7 . Therefore, the explosive growth of all branches of anthropology, biology and medicine demands an improved understanding of natural selection, for both heritable diseases and nondisease traits.
Quantifying selection pressure, especially on human polygenic traits, is a complex and challenging task 1 . Unlike simple traits dominated by a single gene or variant, selection pressure on complex traits often results in polygenic adaptation 8 , where subtle modification of a large number of variants influences phenotypic alteration. Polygenic adaptation could result from different forms of selection such as purifying selection, balancing selection, and hard and soft sweeps 1,8 . Furthermore, revolutions of culture and productivity in human history have influenced selection pressures on human society 9 , which have led to distinct patterns of adaptation at different time scales. Undoubtedly, comprehensive understanding of natural selection should cover all these aspects. So far, only a few studies have managed to generate a multi-aspect picture of selection pressure for single polygenic traits, such as attention deficit hyperactivity disorder 10 and schizophrenia 11 . However, an overall picture covering different types of human traits is still lacking.
With the tremendous advancement of genome-wide association studies (GWAS) 12 and various efficient analytical tools for population genetics 13 , we can now study the selection pressure of human polygenic traits from a multi-dimensional perspective. Here, we leverage GWAS summary statistics of 870 traits to achieve an overview. As shown in Fig. 1, we focus on two primary goals. First, we describe the selection pressure on each trait at four different time scales (Figs. [2][3][4][5]. This is achieved using various metrics derived from different statistical models (Mendelian randomization (MR), singleton density score, ancient genome analysis and so on), each fitting a specific timeframe or form of selection. Second, we integrate these metrics to explore the association among selection pressures, trait characteristics and functional genomic patterns (Figs. [6][7][8], using linear regression and unsupervised clustering.

Results
By running suitable filters (Methods) in the traitDB 12 database combined with literature curation (Methods), we collected GWAS summary statistics for 870 polygenic traits with adequate power (N > 10,000 and single-nucleotide polymorphism (SNP)-based heritability (h 2 ) > 0.01), 738 of which were carried out as part of the UK Biobank initiative 14 . These traits were defined as either disease (encompassing diseases such as Crohn's disease, disorders such as attention deficit hyperactivity disorder, and potentially pathological conditions such as high cholesterol) or non-disease, then separated into 15 categories, such as body measurement and dermatology (Supplementary Table 1 and Fig. 1). To evaluate signals of selection on these traits, we adopted different evaluation techniques (Methods) to quantify signals of selection for four time scales: the present day, recent history (2,000-3,000 years ago), the pan-Neolithic period (the data we use for this period range from ~45,000 to 3,400 years ago) and the time since human speciation ( Fig. 1).

Body measurements and contemporary reproductive success.
Our analysis started by exploring natural selection pressure at the present time. We hypothesized that the current natural selection of a trait is relevant to whether it could causally impact human reproductive success (that is, number of offspring) and mating success (for this, we used the proxy of number of overall sexual partners). To quantify these causal effects, we applied MR on GWAS summary statistics between tested traits and reproductive success, as well as between tested traits and mating success. At the significance cutoff of |z MR | > 4 (Methods), we found that 7.4% of traits with valid MR results (that is, traits passing sensitivity analysis) (40 out of 539) had a causal effect on the number of offspring of males, whereas 5.9% (32 out of 542) of traits with valid MR results impacted the number of offspring of females (Supplementary Table 2). Separating the traits into 15 categories (Fig. 2a,b), we observed that 52% (23/44) of anthropometric body measurement traits such as height (z MR = 8.09, P = 3.33 × 10 −16 in males; z MR = 4.91, P = 4.55 × 10 −7 in females) were causally related to the number of offspring of males. By contrast, only 30% (14/47) of body measurement traits were causally related to the number of offspring of females. In addition, the effect of another type of body measurement (dermatology traits such as skin colour) on reproductive success also exhibited sex specificity: 38% (5/13) of dermatology traits influenced the number of offspring of males, but none affected the number of offspring of females. However, when testing for 112 complex conditions such as schizophrenia 11 and stroke 15 , polygenic risks showed no significant causal effect on the numbers of offspring for either males or females (nominal P > 0.05/112). The distribution of effect direction was also similar between disease and non-disease traits (Fisher P = 0.40 for males, P = 0.71 for females).
For mating success ( Supplementary Fig. 2), body measurement traits also had an impact: 44% of body measurement traits impacted the number of sexual partners of males, compared with 12% affecting the number of sexual partners of females. Interestingly, among all 112 tested polygenic disease traits, schizophrenia (z MR = 7.37, P = 8.53 × 10 −14 ) and attention deficit hyperactivity disorder (z MR = 4.62, P = 1.92 × 10 −6 ) increased the number of sexual partners of males, in line with previous findings that increased genetic liability for schizophrenia does not confer a fitness advantage but does increase mating success 16 . For males, the impact on reproductive success of a trait was positively correlated with its impact on mating success (Supplementary Fig. 2; Pearson correlation coefficient (PCC) 0.47, 95% CI 0.39 to 0.55, P = 9.30 × 10 −31 ). However, this was not true for females, for whom the impact on reproductive success of a trait was negatively correlated with its impact on mating success (Supplementary Fig. 2; PCC −0.10, 95% CI −0.20 to 0, P = 0.02). This discrepancy is consistent with the evolutionary psychology theory that males and females adopt distinct sexual strategies that shape assortative selection 17 .
In addition, we applied causal analysis using summary effect estimates 18 to all MR results to analyse the role of genetic correlation. We found that most of the results were explained mainly by causal effects instead of genetic correlation. Using another GWAS 19 dataset and applying MR bias estimation 20 , we again showed that our results were not explained by GWAS sample overlap ('MR analysis details' in Supplementary Information).
Widespread polygenic adaptation in the past 2,000-3,000 years. Next, we extended our analysis to recent human history (2,000-3,000 years ago to the current time, Fig. 3a-c). We calculated the Spearman correlation between the P value of SNP-trait association and trait-enhancing singleton density scores (tSDS), defined as ρ SDS , developed by Field et al. 21 (Fig. 3c). As expected, the forward simulation (Methods and Supplementary Fig. 3) supported this concept: under neutral demographic history, including population stratification and genetic drift, tSDS generally had no association with GWAS P values (median ρ SDS z score 0.19, P = 0.42). However, when polygenic adaptation was added, ρ SDS deviated from zero (median z score −2.18, P = 0.01, Supplementary Fig. 4). Thus, a significant non-zero ρ SDS confirmed the existence of selection pressure and would not support the possibility of neutral evolution or impacts of population stratification. At the significance threshold of P < (0.05/870 = 5.7 × 10 −5 ), we found that 88% (761/870) of polygenic traits had a significant correlation between the GWAS P value and tSDS (ρ SDS ; Supplementary Table 3). Previous analysis has found that population stratification of UK Biobank might bias the estimated polygenic adaptation 22 . Thus, to exclude this potential confound in our analyses, we applied another method with a different statistical model, which involves reconstructing the history of polygenic scores (RHPS) 23 , based on RELATE 24 (RHPS-RELATE, Methods). We set the reference panel as all European participants of 1000 Genomes to avoid population stratification. As shown in Supplementary Table 3, the polygenic risk score (PRS) alteration in the past 100 generations (roughly equivalent to 2,800 years (ref. 24 )) was mostly in accordance with ρ SDS (PCC 0.25, 95% CI 0.18 to 0.32, P = 3.96 × 10 −13 ). Among 755 traits with significant non-zero ρ SDS , 13.8% (104/755) showed a consistent significant alteration of PRS (P for 'Tx test' from RHPS < 0.05/870, Methods), and 26.1% (197/755) showed a nominally significant alteration (P for Tx test <0.05). Notably, our RHPS-RELATE results also highlighted those traits with the highest ρ SDS , such as ease of skin tanning (P for ρ SDS <10 −100 ; P for Tx test <10 −100 ) and raw vegetable intake (P for ρ SDS <10 −100 ; P for Tx test 2.69 × 10 −51 ) (Supplementary Table 3). In general, the results of RHPS-RELATE were consistent with the ρ SDS analysis, albeit at lower statistical power. Thus, we conclude that the ρ SDS results are credible and can truly reflect recent adaptation prevalence. In the next section, we continue to treat ρ SDS as the main result.

Hunter-gatherer ancestry impacted selection around Neolithic.
To quantify the selection pressure during the pan-Neolithic period 25 , we downloaded three ancient human genome datasets (Neolithic (8,000-4,200 years ago) 26 , pre-Neolithic (~45,000-7,000 years ago) 27 and Near East farmer (~14,000-3,400 years ago) 28 ; Supplementary Table 4) and calculated the polygenic burden (measured by both allele counts and polygenic scores; Methods and Supplementary Fig. 6) for each of 870 traits on all individuals 10 . If one trait went through polygenic adaptation, its polygenic burden would alter over time. Consequently, when we regressed the polygenic burden against samples' age, we expected to obtain significant regression coefficients for such traits. Additionally, the per cent of hunter-gatherer ancestry was also included in the regression due to its profound impact on human evolution. With forward simulation (Methods and Supplementary Fig. 3), we showed that such linear regressions did not introduce false-positive results under population heterogeneity or technical covariates (median regression t = 0.04 under neutral evolution, P = 0.97), and could accurately capture the effects of polygenic adaptation (median regression t = −5.58, P = 1.20 × 10 −8 under polygenic adaptation; Supplementary Fig. 6). As shown in Fig. 4a and Supplementary Table 5, after controlling for covariances (for example, latitude, longitude and genotyping coverage) and multiple tests, the polygenic burden of 78 traits was significantly associated with the percentage of hunter-gatherer ancestry (%HG). By contrast, another six traits, such as denture usage, were associated with time in at least one of three datasets. Seven of 13 dermatology traits were most predominantly associated with %HG (Fig. 4a), with 'ease of skin tanning' as the most significant example (regression t HG = 20.3, P = 1.74 × 10 −38 ; Fig. 4b). In the Near East dataset, we observed that signals of selection on skin tanning varied by latitude (Fig. 4c), with signals of positive selection observed in regions of low latitude (latitude < 50°; t = 4.12, P = 1.91 × 10 −5 ), but signals of negative selection observed at high latitudes (t = 4.95, P = 3.80 × 10 −7 ). After controlling for the impact of latitude, we observed a general ascending trend for 'ease of skin tanning' for the Near East dataset, suggesting overall positive selection (regression t NearEast = 5.81, P = 2.29 × 10 −8 ; Fig. 4c). We also found a nominally significant increment for ease of skin tanning in the pre-Neolithic period (regression t pre-Neolithic = 4.25, P = 1.11 × 10 −5 ), but not in the Neolithic period (regression t Neolithic = 0.92, P = 0.18; Supplementary Fig. 7).
When analysing the regression t statistics for all traits (Fig. 4d), we found that regression t HG was positively associated with regression t NearEast (PCC 0.55, 95% CI 0.49 to 0.61, P = 6.27 × 10 −69 ) and regression t pre-Neolithic (PCC 0.61, 95% CI 0.55 to 0.67, P = 4.59 × 10 −89 ), but was negatively correlated with regression t Neolithic (PCC −0.29, P = 1.50 × 10 −17 ). This result suggested that traits related to hunter-gatherer ancestry were favoured by natural selection in the pre-Neolithic period and Near East farming societies but were suppressed by natural selection during the Neolithic period. This trend was also observed for polygenic disease traits, albeit with inconsistent statistical power (PCC 0.18, 0.48 and −0.23; 95% CI −0.08 to 0.38, 0.31 to 0.63 and −0.38 to −0.02; P = 0.06, 3.62 × 10 −6 and 0.01 for the three periods, respectively; green points and texts in Fig. 4d). As shown in the diagonal plots in Fig. 4d, we also found that polygenic disease traits generally show evidence of more negative selection pressure than non-disease traits. This was most significant in Near East farming societies (median t NearEast = −0.66, permutation   However, we still observed that 13 disease traits showed signals of positive selection in the pan-Neolithic period (that is, t NearEast , t pre-Neolithic or t Neolithic > 0 and P < 0.05), including immunological diseases such as Crohn's disease (t pre-Neolithic = 2.86, P = 0.013), atopic dermatitis (t Neolithic = 2.61, P = 0.01) and periodontitis (t pre-Neolithic = 2.48, P = 0.029) ( Supplementary Fig. 7).

Heritability enrichment around genomic selection signals.
To expand our analysis to a more ancient time scale, we annotated the genomic regions undergoing different forms of selection at different times using multiple statistical models 11,29-33 , including average ascertained sequentially Markovian coalescent (ASMCavg), mutation-sensitive genes, and conserved regions. Then, we applied linkage disequilibrium (LD) score regression (LDSC) 34 to see whether the heritability of each trait was enriched in any of the annotated genomic regions mentioned above (after controlling the effects of every other region; Methods and Supplementary Fig. 8), which was considered an indicator that the trait has undergone natural selection 11 . As shown in Fig. 5a and Supplementary Table 6, we detected widespread heritability enrichment in genomic regions with low ASMCavg, indicating background selection across genomic regions during the past hundreds of thousands of years. The P value by LDSC was significantly increased compared with the null distribution (P < 10 −100 ; Fig. 5b). We also observed heritability enrichment around mutation-sensitive genes, in regions with low LD and regions with high conservation 34 . We observed that traits showing the highest enrichment in low-ASMCavg regions included body water mass (z = −7.32, P = 1.24 × 10 −13 ), intelligence (z = −5.65, P = 1.24 × 10 −13 ) and schizophrenia (z = −5.55, P = 1.43 × 10 −8 ) (Fig. 5c). Similar trait heritability enrichment, such as for schizophrenia (z = 6.23, P = 2.33 × 10 −10 ) (Supplementary Fig. 9), was also observed around mutation-sensitive genes. Consistent with the background selection explained above, variants with high deleteriousness (measured by Combined Annotation Dependent Depletion (CADD)) were also significantly associated with polygenic traits (Supplementary Fig. 9). We found that the heritability for traits such as large artery strokes (z = 8.84, P = 4.79 × 10 −19 ) and ever been drinkers (z = 7.68, P = 7.95 × 10 −15 ) were explained mostly by high-CADD variants whose alternative alleles enhanced the traits (namely, CADD_trait-Enh variants) ( Supplementary Fig. 9). We also analysed other forms of selection (Methods and Supplementary Table 6), and found that the heritability of large artery strokes was significantly enriched in regions undergoing a soft sweep (z = 4.08, P = 2.25 × 10 −5 ). By contrast, the heritability of beer intake was enriched in genomic regions influenced by balancing selection (z = 3.83, P = 6.41 × 10 −5 ).
By comparing the number of traits reaching the significance threshold for each genomic annotation between disease and nondisease traits ( Supplementary Fig. 10), we found that CADD_trait-Enh variants were predominantly associated with polygenic disease traits (OR 9.58, 95% CI 5.53 to 16.61, P = 5.69 × 10 −6 ). CADD_ traitEnh variants' z scores for disease traits were also generally greater than those for non-disease traits (permutation P = 0.0002, Supplementary Fig. 10). Surprisingly, conditions such as atrial fibrillation (z = 3.45, P = 0.0003), anorexia nervosa (z = 3.35, P = 0.0004) and rheumatoid arthritis (z = 3.16, P = 0.0008) showed heritability enrichment in deleterious variants whose alternative     Table 6). This result suggested that the risks of these three conditions might be increased by negative selection through elimination of their protective alleles. Additionally, we found that conserved regions (permutation P = 0.003, Supplementary Fig. 10) and low-ASMCavg regions (permutation P = 0.07, Supplementary  Fig. 10) tended to contribute to the non-disease traits.
Facial shape shows selection signal since human speciation. Finally, to analyse overall selection pressure since human speciation, we followed Esteller-Cucala et al. 10 and calculated the percentage of trait-enhancing derived alleles in all trait-associated SNP, named traitEnhDA%. Specifically, if one trait was favoured by natural selection, newly emerged alleles that enhance this trait would have a better chance to survive and spread throughout the population. Thus, traitEnhDA% would be larger than 50% (random expectation). As expected, our forward simulation results supported this hypothesis (median traitEnhDA% of 61% under adaptation versus 49.9% under neutral evolution; Supplementary Fig. 11). Among all 870 tested traits, facial shape measurements showed the most significant signals (Fig. 5d and Supplementary Table 7): 11 out of 17 facial shape measurements had significantly larger traitEnhDA% than expected (binomial P < 0.05/870). Specifically, the distance between the lower lip margin, known as labiale inferius, and three nose landmarks had the highest traitEnhDA% of all traits ( Fig. 5d; traitEnhDA% > 60%, binomial P < 10 −25 ). It is worth noting that other distances between face landmarks also had trait-EnhDA% > 50%, albeit at lower statistical significance. This result suggests that natural selection tended to enhance distance among facial landmarks. Aside from facial measurements, there were another 17 traits where traitEnhDA% differed significantly from 50% (Fig. 5e), such as pigmentation-related traits which were significantly suppressed by selection (such as ease of skin tanning with traitEnhDA% of 45% and black hair with traitEnhDA% of 46%).
Selection pressure was associated with ancient selection. Having quantified signals of selection pressure at different time scales using different evaluation metrics, we attempted to assess the potential factors impacting natural selection measurements. We first analysed whether basic GWAS covariates significantly influenced the 16 selection metrics that we calculated in different human development times (Fig. 6a). We regressed the absolute value of these 16 metrics against the following measurements: trait heritability (h 2 ), the natural log of GWAS sample size (lg(N)), enrichment . Bar plots denoted the R 2 for corresponding linear regression. b, Similar to a, but for binary traits. c, Heatmap showing the t value of linear regression, using ancient selection metrics (columns) to predict recent selection metrics (rows). Abbreviation as in Fig. 5. Ncm, number of children of male; Ncf, number of children of female; sds, singleton density score; Pre-Neo, pre-Neolithic; %HG, percentage of hunter-gatherer ancestry; cms, composite of multiple signals; Bal.sel, balancing selection; %DA, percentage of trait-increasing derived allele; prop, proportion of case; h2, heritability; asmc, ascertained sequentially Markovian coalescent. in brain-expressed genes (named 'brain'), whether GWAS was a meta-analysis of multiple populations (named 'meta') and the proportion of cases (named 'prop'; only for binary traits). As shown in Fig. 6a,b, metrics on GWAS with a larger sample size N generally had a larger magnitude (regression t for lg(N) of −2.7 to 16.2; 78% of t > 0). This was also true for binary-value traits with a higher proportion of cases (94% of t for 'prop' > 0) and quantitative traits with higher heritability (75% of t for h 2 >0). Among all metrics, z ASMC was impacted mostly by these covariates (R 2 = 0.43 and 0.24 for quantitative and binary-value traits, P < 0.001). Since the influence of covariates was not neglectable, we adjusted all metrics prior to all downstream analyses (Methods).
We then applied linear regression on the scaled and adjusted selection metrics (Supplementary Tables 8-10) to analyse the associations between them. We reasoned that, if environmental pressure were identical throughout human history, the strength of selection at a later time would be associated with that at earlier times. As shown in Fig. 6c, as expected, ρ SDS was associated mostly with ancient selection metrics (R 2 = 0.54, F test P < 10 −100 ), especially on HG% (regression P = 1.50 × 10 −29 ) and t NearEast (regression P = 2.69 × 10 −22 ). R 2 for the other six metrics was moderate (R 2 = 0.11-0.22, F test P < 2.2 × 10 −16 ). However, z ncm and z ncf were negatively associated with traitEnhDA% (regression t = −3.20 and −1.12), whereas all other metrics had a positive association with traitEnhDA% (t = 1.48-6.81). Since traitEnhDA% represented overall selection since human speciation, we consequently hypothesized that, while selection pressure might be undergoing some kind of alteration at the present time, it is generally associated with a more ancient selection.
To test this hypothesis and gain a continuous view of adaptation trajectories, we applied RHPS-RELATE to infer the population-mean PRS trajectory of each trait, then applied time series hierarchical clustering on dynamic time warping 35 to elucidate its pattern. As shown in Fig. 7 and Supplementary Table 11, the trajectories of 434 and 308 traits were grouped into clusters 1 and 2, respectively, which generally showed accelerating monotonic increasing or decreasing trends from about 500 generations ago. Typical examples were raw vegetable intake ( Supplementary  Fig. 12, P by Tx test 23 between 496 and 96 generations <10 −100 ) and atopic dermatitis ( Supplementary Fig. 12, P by Tx test between 496 and 96 generations <10 −100 ). On the other hand, 13 and 10 traits were grouped into cluster 3 and 4, respectively, characterized by a sharp turnover of adaptation directions around 133 generations ago (Fig. 5d). These traits included intelligence and insomnia ( Supplementary Fig. 12). Taken together, most of the tested traits experienced consistent selection pressure in the past, and only a few exceptions had undergone alteration of selection direction.

Functional genomic architectures impact selection pressure.
Besides the impacts of ancient selection on recent selection, we were also interested in the impact of genetic architectures on selection pressure. Thus, we applied step-wise linear regression on each selection pressure metric to explore whether the genetic characteristics (for example, functional genomics enrichment or variant deleteriousness) could influence the selection pressure of the trait. We found that functional genomic patterns explained 3% (trait-EnhDA%) and ~20% (z ASMC ) of the variance in selection pressure (R 2 = 0.03-0.20; Fig. 8a). Adding conservation annotations (annotations directly related to natural selection such as LLD and allele age 34 ) into the model increased R 2 by up to 0.3 (ρ SDS ). This increment was mainly driven by the inclusion of CADD_traitEnh and CADD_traitWeak variants; for the model of ρ SDS , CADD_traitEnh and CADD_traitWeak variants increased R 2 by 0.29.

Discussion
We quantified signals of selection on human polygenic traits at four different time scales across human evolution using MR, singleton density score, ancient genome analysis and heritability partition. We examined the characteristics of signals of selection, such as their prevalence and strength, uneven distribution among time points and trait categories as well as their association with genetic architectures. By analysing the tSDS correlation and PRS trajectory, we showed evidence consistent with widespread recent polygenic adaptation among different kinds of traits. The observation that polygenic adaptation was common among complex traits has often been questioned by researchers 8 because (1) population stratification is known 22 to inflate the signal of ρ SDS and (2) existing studies on polygenic adaptation usually focus on single traits 36,37 . In our study, the forward simulation and the use of RHPS-RELATE provide evidence to inform this debate. First of all, by utilizing simulations of genetic drift and demographic isolation strategies, our results suggest that population stratification did not drive a systematic bias on ρ SDS . We consequently propose that the observed bias on height might not represent the majority of traits. Second, false-positive ρ SDS findings were mainly driven by a large number of SNPs with small effects 22,24 , whereas RHPS-RELATE only analysed top loci with large effects 23,24 . Third, we included various European populations from 1000 Genomes 38 in RHPS-RELATE instead of using only a UK sample 21 and applied a different statistical test (Tx test from RHPS) to analyse the significance of adaptation. Because ρ SDS , forward simulation and RHPS-RELATE all gave convergent results, we suggest that widespread recent polygenic adaptation is plausible.
One of our most interesting results was the finding that pigmentation, body measurement and dietary traits were continuously under intense selection pressure across various human development time scales. Pigmentation is one of the most thoroughly studied examples of human evolution. The tremendous spatiotemporal variations of skin colour reflect the complex balancing between ultraviolet damage, vitamin D requirements and heat regulation 39 . With the ease of skin tanning as an example, our results also revealed a complex evolutionary history of pigmentation: we find evidence for selection for dark skin both before the Neolithic period and in recent history but inconsistent selection during the Neolithic period. Body size and dietary habits, on the other hand, were mostly shaped by trade-offs among energy allocations to growth, maintenance, digestion and other functions 40,41 . Our result also suggested that, among other factors that might impact energy allocation, such as ecology, climate and migration, genetic factors influenced the evolution of body measurement and nutrition intake traits.
Another question we investigated was why polygenic disease traits were not eliminated by natural selection. Conditions including anorexia nervosa and inflammatory bowel disease showed a signal of positive adaptation across human history. In other literature, researchers have suggested that foraging for food is typical  behaviour when facing the threat of starvation and thus will be favoured by selection during periods of food supply shortage, and this is one evolutionary hypothesis for anorexia nervosa 42 . For inflammatory bowel disease, researchers have suggested that disease vulnerability may be associated with high defence against pathogens, which may have provided survival advantages in ancient societies with poor sanitary conditions 43 . These results highlight the potential role of balancing selection in human history: when survival advantage was accompanied by high disease burden, the antagonistic force of selection may have pulled associated genetic variants to an optimal midpoint. However, despite the few cases of disease presented in our study, our results suggest that the majority of the disease genetic burden was indeed being suppressed by natural selection yet persisted in the human gene pool. Several hypotheses could explain this. One theory, the reproductive advantage theory 16 , suggests that high-risk but no-onset individuals might have a reproductive advantage and raise more offspring, which might prevent the elimination of risk alleles in the population. However, after P value adjustment for multiple testing, our MR analysis found no significant association between reproductive success and disease genetic risks. Another theory, the Hill-Robertson effect 44 , proposes that genomic features, such as low allele age and low recombination rate 45 , promote the spread of risk alleles during genetic drift 11 . Although our analysis confirmed that disease heritability was indeed enriched in these genomic annotations, the extent of enrichment was not significantly greater than that of non-disease traits. Consequently, although all the above possibilities explained a proportion of disease heritability, there is still room for another 'trivial explanation': natural selection was indeed eliminating the risk alleles but simply not fast enough, due to the small effect of each allele and the small effective population size at the risk loci 8,46 .
An important limitation of this work is that, because of the composition of currently available large-scale GWAS, which predominantly include European participants 47 , especially in the case of UK Biobank 14 , we did not use data from wider, more extensive multi-ethnic GWAS-based genetic studies. Thus, our results had insufficient statistical power to dissect Mainland Europe's sub-population, and even less for a broader population investigation into the rest of the world. The reliance on UK Biobank GWAS might also lead to bias due to imprecise phenotype definition based on self-report questionnaire. In addition, the power to explore more ancient history (more than 100,000 years ago) is also limited since the available tools suitable for such a long time scale could only detect a few sweeps at a single locus 1 . In the future, the further development of multi-ethnic GWAS, ancient human genome analysis, and analytical tools for extended time scales and variable effective population size will help to further uncover the landscape of human evolution.
It should be noted that the findings we report here are also limited by the inherent nature of GWAS, which cannot distinguish causality from association, nor find rare variants that may have large effect sizes, and only explains a limited amount of phenotypic variation.
Nevertheless, we provide an overview of natural selection on human polygenic traits and their characteristics across human evolution, which could serve as a foundation for future studies regarding human genetics and evolution.

GWAS filtration and preprocessing.
We downloaded all GWAS summary statistics from traitDB 12 release 1 (access date April 2020) and retained those conducted solely in cohorts of European ancestry. Since traitDB was released in November 2019, we additionally conducted a literature search for all GWAS of European ancestry published between October 2019 and April 2020. All these GWAS were filtered according to the following criteria: sample size >10,000, SNP-based heritability (h 2 ) calculated by LDSC >0.01 and z score of h 2 > 4. No statistical methods were used to pre-determine sample sizes since this was determined by the original GWAS. For all included polygenic disease traits, we additionally separated them according to onset age: disease traits that preliminarily onset before reproductive age (18 years old) were labelled 'early' , disease traits that preliminarily onset after reproductive age (50 years old) were labelled 'late' and the remaining disease traits were labelled 'lifetime' . Allele frequency for all variants was estimated using all 1000 Genome phase 3 (ref. 38 ) from the European population. Details of data selection and reformatting are provided in 'Choice and reformatting of GWAS included' section of Supplementary Information. Mendelian randomization. To measure human reproductive success and mating success, we downloaded the GWAS summary statistics of the number of offspring of males and females and the number of male and female sexual partners for both sexes from the Benjamin Neale Lab (http://www.nealelab.is/uk-biobank). For each of the 870 traits, we selected SNPs with P < 5 × 10 −8 as the instruments. We retained all instruments presented in the outcome GWAS, then pruned at the threshold of LD > 0.01 in 1000 Genomes. The data harmonization was applied separately for each exposure-outcome pair using the R package TwoSampleMR v0.5.4 (ref. 48 ).
For each exposure-outcome pair, we first calculated the per-instrument MR effects by Wald ratio, then meta-analysed the results for all instruments using three methods: (1) Inverse variance weighted (IVW), which was considered the primary result; (2) the weighted median (WM) method 49 , which was relatively robust when some of the instruments were invalid; (3) Egger regression 50 , which allowed for non-zero directional pleiotropy.

Adjustment of pleiotropy and genetic correlation for MR.
To get rid of the influence of outliers and pleiotropy in a uniform manner, we applied a step-wise outlier removal test for each exposure-outcome pair. Specifically, we first applied three sensitivity tests (Cochran's Q test, Rucker's Q test and Egger intercept test) 51 on all instruments. If P values of any of these tests were <0.05, we applied the MR-PRESSO ('Mendelian Randomization Pleiotropy RESidual Sum and Outlier') outlier test 52 to calculate the observed residual sum of squares (RSS obs ) for all instruments and ranked them in descending order of RSS obs . We removed the topmost instrument and repeated the three MR analyses and three sensitivity analyses on the remaining instruments. If P values of any sensitivity tests were still <0.05, we repeated this procedure by removing the top two, top three, up to top (n − 3) instruments until all sensitivity tests had P values >0.05 (leftmost black point in Supplementary Fig. 13). The MR results at this stage were considered the final result. If P values of any sensitivity tests were <0.05 throughout the procedure, we denoted the MR results for this exposure-outcome pair as NA. For exposure-outcome pairs with fewer than three instruments, we provide the MR results (Wald ratio or IVW) in Supplementary Table 2 but did not consider them in the downstream analysis. After obtaining outlier-free MR results for all pairs, we defined the significant causal effect as following: a z score of IVW estimation >4 or < −4, and estimates of IVW, WM and Egger regressions that were all in the same direction. Those results not reaching the significance criteria were still included in the correlation analysis (Fig. 2c). We also applied causal analysis using summary effect estimates 18 to distinguish causality from genetic correlation (see 'MR analysis detail' section of Supplementary Information). tSDS analysis. The tSDS 21 is a metric that measures the density of singleton mutations around a tested SNP. Based on the assumption that positive selection distorts the ancestral genealogy of haplotype and leads to shorter terminal branches for the favoured allele, tSDS > 0 indicates that the trait-enhancing allele of the tested SNP has an increased frequency in recent history (about 100 generations) 21 .
We ranked all SNPs with derived allele frequency between 0.05 and 0.95 in UK10K data in the ascending order of their GWAS P value and grouped them into consecutive bins of 1,000 SNPs each. We calculated ρ SDS as the Spearman correlation coefficients between the median tSDS for each bin and the order of bins. We applied a jackknife procedure to estimate the confidence interval and statistical significance of ρ SDS ('tSDS analysis' of Supplementary Information).

Polygenic burden of ancient humans.
To analyse the pan-Neolithic trajectory of the polygenic burden for each trait, we downloaded ancient human genotype data for three studies, following Esteller-Cucala et al. 10 : pre-Neolithic 27 (51 individuals), Neolithic 26 (180 individuals) and Near East farmers 28 (281 individuals). The genotype data were transformed into ped and bim files using EIGENSOFT v6.1.4 53 and plink v1.07 54 , with only SNPs that had retained an rs ID. For each trait and each ancient dataset, we retrieved all SNPs with GWAS P < 0.01 and applied LD pruning in the ancient dataset using plink with the parameter '-indep 50 5 2' to obtain a list of independent trait-associated SNPs (tSNP). We excluded individuals with >90% missing information on the tSNP. Similar to the work of Esteller-Cucala et al. 10  trait-enhancing alleles on all non-missing tSNP). As a positive control, we also replaced the allele counts with the polygenic risk scores and repeated the entire analysis (see ' Analysis of ancient human genome' in Supplementary Information).
In each dataset, we fitted a linear model to test for the relation between f and time to the present day, which reflected the polygenic adaptation on the traits. We collected data from the Reich lab and included latitude, longitude, genotyping technique, sex, whether damage restrict was performed on the sample, genotyping coverage, the mixture time of hunter-gatherer and farmer ancestry inferred in the Neolithic dataset, number of SNPs genotyped and the fraction of the library as covariates (some of the covariates were not provided in particular datasets, see ' Analysis of ancient human genome' of Supplementary Information). For the Neolithic dataset, the percentage of hunter-gatherer ancestry (%HG) was also included as a predictive variable. From the regression results, we retrieved the t statistics for time to present (t pre-Neolithic , t NearEast and t Neolithic , respectively) and %HG (t HG ) as well as their P values as the measurements of polygenic adaptation. We also applied Pearson correlation analysis among these four matrices using the R package GGally v2.0 (ref. 55 ).
Heritability partition on genomic regions exhibiting different levels of evidence for natural selection. We adopted a strategy similar to Pardiñas et al. 11 , which first identified genomic regions under different selection pressures then partitioned the heritability of each trait to these regions by LDSC v1.0.1 (ref. 34 ). We obtained and reformatted the following genomic annotations from the literature (under hg19 position): 1. B2 (ref. 33 ), a metric of a set of composite likelihood-ratio test statistics that are based on a mixture model. Regions with high B2 statistics were under balancing selection in about 10,000 generations. We downloaded the B2 statistics calculated by BalLerMix v1.0 (ref. 33 ) on the 1000 Genomes CEU population 38 and annotated each SNP according to the B2 statistics of the region that covered it. 2. Combined Annotation Dependent Depletion (CADD) 7 . We downloaded CADD v1.3 for all 1000 Genomes single-nucleotide variants and indels and dichotomized all variants at the threshold of CADD-phred score >20.
We further generated trait-specific annotations (CADD_traitEnh and CADD_traitWeak) according to the effect of alternative alleles, thus variants whose alternative alleles enhanced the trait and had CADD-phred >20 were annotated '1' for CADD_traitEnh, whereas all other variants were annotated '0' . Similarly, variants whose alternative alleles weakened the trait and had CADD-phred >20 were annotated '1' for CADD_traitWeak. 3. Composites of multiple signals (cms) 31 . cms integrated signals of several previous methods for detection of positive selection and could detect positive selection in the last few tens of thousands of years at high resolution. We directly downloaded the genomic regions with significantly high cms from Grossman et al. 31 and generated trait-specific dichotomized annotations (cms+ and cms−). 4. Hard and soft sweep predicted by Trendsetter v1.0 32 . Trendsetter applied a penalized regression framework that took statistics at adjacent regions into account and predicted whether each genomic region had undergone a hard sweep, soft sweep or neutral alteration. We downloaded the prediction results of the CEU population and labelled each genomic region by the label with the highest probability, and generated trait-specific annotations according to these labels (hard+, hard−, soft+ and soft−). 5. Neanderthal introgressions 29 . We downloaded the average posterior probability scores of being a Neanderthal haplotype from Sankararaman et al. 29 and dichotomized at the top 0.01 probability score. 6. pLi 56 . We curated the gene list of pLi >0.9 as mutation-intolerant genes and generated the annotation of the physical position of these genes and the flanking regions of 100 kb at both 3′ and 5′ ends.
We generated these annotations for each trait and added them to the updated baseline annotations of LDSC 34 to apply a heritability partition. We retrieved the z score for the LDSC coefficient τ c for each annotation as the main results, which measured the heritability enrichment in each annotation on the condition of all other annotations. Some of the annotations from the baseline model that also measured some aspects of natural selection were retrieved (ASMCavg 30 , B statistics 57 , recombination rate 34 , conserved regions 57 , etc.; see 'Choice of LDSC annotations' information in Supplementary Information).

Analysis of derived allele distribution.
We calculated the proportion of included SNPs for which the derived allele had a trait-enhancing effect (traitEnhDA%) and calculated its corresponding P value with two-sided binomial tests assuming a random proportion of 50%. To achieve this, we first obtained the derived allele of about four million SNPs provided by 1000 Genomes phase 1 (ref. 38 ) curated by Field et al. 21 . For each trait, we included in our analysis SNPs meeting the following criteria: having a derived allele annotation, a derived allele frequency between 0.05 and 0.95 in 1000 Genomes European participants, and a GWAS z score >3 or <−3. These SNPs were LD pruned by plink 54 with the parameters '-indep 50 5 2' and 1000 Genomes EUR as reference panels.
Another hypothesis is that purifying selection tended to suppress the alleles with a large effect on traits to a low frequency 58 , thus the proportion of genetic variance explained would be enriched in SNPs with a lower derived allele frequency. We calculated an S statistic to depict this rule, but the forward simulation did not confirm its power ('Definition of S statistics' in Supplementary Information). Thus, we did not include this as part of our main results.

Simulation analysis.
To verify that these statistics were not inflated by population stratification or genetic drift, and properly reject neutral evolution during human history, we generated a forward genetic simulation using Slim 3 (ref. 59 ). As shown in Supplementary Fig. 3, we simulated a pair of 10-MB chromosomes under Gravel's model 60 of human evolution embedded in Slim, with a recombination rate of 1 × 10 −8 and mutation rate of 2.36 × 10 −8 . De novo mutations had 4.5% probability of having a non-zero selection coefficient following a normal distribution and 0.5% probability of being quantitative trait loci (QTL). Details of the method and results of forward simulation can be found in 'Simulation detail' section of Supplementary Information. Integrative analysis of all selection pressures. We first analysed the impact of the following variables on the magnitude of detected selection pressure: 1. h 2 estimated by LDSC. 2. lg(N), where N denotes the sample size of GWAS. 3. Trait heritability enrichment in brain-preferentially expressed genes.
Specifically, we downloaded the genomic annotation of 'SNP around GTEx cortex-specific genes' , 'SNP around all coding genes' , and their corresponding LD scores from the LDSC ftp site. (https://alkesgroup.broadinstitute.org/LD-SCORE/). We ran LDSC on these annotations using the default LDSC setting as above and defined the coefficient z score for cortex genes as the metric of brain expression enrichment. 4. 'meta': whether the GWAS was derived from the meta-analysis (that is, at least two cohorts of different ancestry contributed at least 10% of participants). 5. 'prop': for binary traits, the proportion of minor cases.
We regressed the absolute value of each selection metric against these variables and summarized the t statistics to quantify their impact. Since h 2 and brain enrichment were the inner characteristics of the trait, whereas lg(N), 'meta' and 'prop' were technical GWAS covariates, we decided to adjust all metrics for the latter three variables. Specifically, we first divided all traits according to (1) lg(N) > 5.2, (2) is 'meta' and (3) 'prop' >0.2 (binary traits only). These thresholds were chosen based on the density plots (not shown). In each subgroup, we calculated the z score of the absolute value of each metric (following the positive half of the standard normal distribution) and retained the sign of the original metric.
For each of the seven selection pressure matrices with a definite time scale (z ncm , z ncf , ρ SDS , t Neolithic , t NearEast , t pre-Neolithic and t HG ), we applied linear regression that included all matrices whose time scales were more ancient than it ('Linear models for each selection metric' of Supplementary Information). The regression was run on all traits with reliable results of the corresponding response variables (for example, traits without homogeneous MR results were not included in the regression of z ncm and z ncf ).

Reconstructing the History of Polygenic Scores (RHPS) and RELATE.
To estimate the trajectory of population-mean PRS for each trait, we applied the Relate Reconstructing the History of Polygenic Scores (RHPS) v1.0 method proposed by Edge et al. 23 , which utilized local coalescent trees at GWAS locus to estimate the population-mean PRS of a trait. As suggested by Edge et al. 23 , for each GWAS we calculated the Bayesian factor (BF) for each SNP as where z and s.e. are the GWAS statistics for this SNP. Next, we partitioned all SNPs into 1,702 consecutive and independent blocks provided by Pickrell et al. 61 , and chose the SNP with the largest BF from each block. To maximize the computational efficiency, we retained SNPs with BF > 10,000. The population-mean PRS at ancient time t was estimated as where G denotes all SNPs retained, β i denotes the GWAS effect size of SNP i and pi (t) is the population frequency of SNP i for time t. To estimate p i (t), we applied RELATE (ref. 24 62 and partitioned the traits by hierarchical clustering. We chose the number of clusters as k = 4 by comparing the silhouette coefficient for clustering results with k = 2 to 10.

Impacts of genetic architectures on selection pressures.
Despite the genomic annotations that directly measured selection pressure, we were also interested in whether heritability enrichment in other annotations could impact natural selection. These annotations were roughly divided into three groups (columns of Fig. 6a): those that are associated with selection, termed 'conservative annotations' , and those without evidence of direct association with selection, termed 'functional genomic annotations' , and CADD-related annotations. The annotations of minor allele frequency bin were discarded. For each of the selection pressure metrics (Fig. 6a), we first fitted a full linear model including LDSC z scores of all conservative and functional genomic annotations, then applied a bi-directional step-wise regression aiming at maximization of Akaike information criterion to obtain a simplified model. Full results for all simplified models can be found in Supplementary Table 12.
To analyse the contribution of different groups of annotations, we subtracted three sub-models from each simplified model: (1) a model containing only functional genomic annotations, (2) a model containing functional genomic annotations and CADD annotations and (3) a model containing all annotations. We applied each of these models to generate predicted values for each selection metric and calculated corresponding R 2 values for these precited values.
Statistical analysis. For all comparisons of metrics among groups, we applied a two-sided Fisher-Pitman permutation test by oneway_test from the R package coin v1.3.1 (ref. 63 ). For all comparisons between two metrics, we applied Pearson correlation analysis. For comparisons between two distributions, we applied the two-sided Kolmogorov-Smirnov test. The significance threshold was set at P < 0.05/870 unless otherwise specified.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All GWAS summary statistics analysed in the current study could be downloaded from the public domain. All data generated in the current study could be obtained from the Supplementary Information.

Code availability
Scripts used for this study is available at https://github.com/WeiCSong/selection.

March 2021
Corresponding author(s): Guan Ning Lin Last updated by author(s): Sep 21, 2021 Reporting Summary Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection Scripts used for this study is available at https://github.com/WeiCSong/selection.

Data analysis
Scripts used for this study is available at https://github.com/WeiCSong/selection. For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy All GWAS summary statistics analyzed in the current study could be downloaded from the public domain. All data generated in the current study could be obtained from the Supplementary materials.