1 Timpson, N. J., Greenwood, C. M. T., Soranzo, N., Lawson, D. J. & Richards, J. B. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat Rev Genet 19, 110-124 (2018).
2 Nielsen, R. et al. Tracing the peopling of the world through genomics. Nature 541, 302-310 (2017).
3 Genetics for all. Nat. Genet. 51, 579 (2019).
4 Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584-591 (2019).
5 Genome of the Netherlands, C. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818-825 (2014).
6 Consortium, U. K. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82-90 (2015).
7 Gudbjartsson, D. F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435-444 (2015).
8 Kowalski, M. H. et al. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet. 15, e1008500 (2019).
9 Nagasaki, M. et al. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat Commun 6, 8018 (2015).
10 Jeon, S. et al. Korean Genome Project: 1094 Korean personal genomes with clinical information. Sci Adv 6, eaaz7835 (2020).
11 Wu, D. et al. Large-Scale Whole-Genome Sequencing of Three Diverse Asian Populations in Singapore. Cell 179, 736-749 e715 (2019).
12 GenomeAsia, K. C. The GenomeAsia 100K Project enables genetic discoveries across Asia. Nature 576, 106-111 (2019).
13 Shi, Y., Li, L., Wang, Y., Chen, J. & Stanley, H. E. A study of Chinese regional hierarchical structure based on surnames. Physica A 518, 169-176 (2019).
14 Xie, G., Lin, Q., Wu, Y. & Hu, Z. The Late Paleolithic industries of southern China (Lingnan region). Quaternary International 535, 21-28 (2020).
15 Cao, Y. et al. The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals. Cell Res. (2020).
16 Xu, S. et al. Genomic dissection of population substructure of Han Chinese and its implication in association studies. Am. J. Hum. Genet. 85, 762-774 (2009).
17 Chen, J. et al. Genetic structure of the Han Chinese population revealed by genome-wide SNP variation. Am. J. Hum. Genet. 85, 775-785 (2009).
18 Liu, S. et al. Genomic Analyses from Non-invasive Prenatal Testing Reveal Genetic Associations, Patterns of Viral Infections, and Chinese Population History. Cell 175, 347-359 e314 (2018).
19 Chiang, C. W. K., Mangul, S., Robles, C. & Sankararaman, S. A Comprehensive Map of Genetic Variation in the World's Largest Ethnic Group-Han Chinese. Mol. Biol. Evol. 35, 2736-2750 (2018).
20 Sirugo, G., Williams, S. M. & Tishkoff, S. A. The Missing Diversity in Human Genetic Studies. Cell 177, 1080 (2019).
21 Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161-164 (2016).
22 Bai, W. Y. et al. Genotype imputation and reference panel: a systematic evaluation on haplotype size and diversity. Brief. Bioinform. (2019).
23 McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279-1283 (2016).
24 Zhu, X. et al. Cohort profile: The Westlake BioBank for Chinese (WBBC) pilot cohort: a prospective study for the late adolescence. medRxiv, 2020.2012.2016.20248291 (2020).
25 Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68-74 (2015).
26 Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285-291 (2016).
27 Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308-311 (2001).
28 Lander, E. S. & Schork, N. J. Genetic dissection of complex traits. Science 265, 2037-2048 (1994).
29 Terhorst, J., Kamm, J. A. & Song, Y. S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303-309 (2017).
30 Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).
31 Wilcoxin, F. Probability tables for individual comparisons by ranking methods. Biometrics 3, 119-122 (1947).
32 Fagerberg, L. et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397-406 (2014).
33 Thayer, T. et al. Sorting Nexin 29 (SNX29) as a Novel Biomarker for Vasoresponsive Pulmonary Arterial Hypertension. Am. J. Respir. Crit. Care Med. 201, A4397-A4397 (2020).
34 Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).
35 Mou, C. et al. Enhanced ectodysplasin-A receptor (EDAR) signaling alters multiple fiber characteristics to produce the East Asian hair form. Hum. Mutat. 29, 1405-1411 (2008).
36 Tan, J. et al. The adaptive variant EDARV370A is associated with straight hair in East Asians. Hum. Genet. 132, 1187-1191 (2013).
37 Riddell, J., Basu Mallick, C., Jacobs, G. S., Schoenebeck, J. J. & Headon, D. J. Characterisation of a second gain of function EDAR variant, encoding EDAR380R, in East Asia. Eur. J. Hum. Genet. (2020).
38 CONVERGE, c. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 523, 588-591 (2015).
39 Das, S., Abecasis, G. R. & Browning, B. L. Genotype Imputation from Large Reference Panels. Annu Rev Genomics Hum Genet 19, 73-96 (2018).
40 Huang, J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat Commun 6, 8111 (2015).
Methods References
41 Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589-595 (2010).
42 Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics 43, 11 10 11-11 10 33 (2013).
43 Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91, 839-848 (2012).
44 Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867-2873 (2010).
45 Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
46 Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156-2158 (2011).
47 Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet Chapter 7, Unit7 20 (2013).
48 Schwarz, J. M., Cooper, D. N., Schuelke, M. & Seelow, D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods 11, 361-362 (2014).
49 Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980-985 (2014).
50 Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
51 Browning, B. L., Zhou, Y. & Browning, S. R. A One-Penny Imputed Genome from Next-Generation Reference Panels. Am. J. Hum. Genet. 103, 338-348 (2018).
52 Linderman, M. D. et al. Analytical validation of whole exome and whole genome sequencing for clinical applications. BMC Med. Genomics 7, 20 (2014).
53 McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297-1303 (2010).
54 McVean, G. A genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).
55 Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene frequencies in Europeans. Science 201, 786-792 (1978).
56 Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655-1664 (2009).
57 Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904-909 (2006).
58 Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065-1093 (2012).
59 Weir, B. S. & Cockerham, C. C. Estimating F-Statistics for the Analysis of Population Structure. Evolution 38, 1358-1370 (1984).
60 Browning, B. L. & Browning, S. R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459-471 (2013).
61 Field, Y. et al. Detection of human adaptation during the past 2000 years. Science 354, 760-764 (2016).
62 Gautier, M., Klassmann, A. & Vitalis, R. rehh 2.0: a reimplementation of the R package rehh to detect positive selection from haplotype structure. Mol. Ecol. Resour. 17, 78-90 (2017).
63 Pickrell, J. K. et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826-837 (2009).
64 Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079 (2009).
65 Delaneau, O., Zagury, J. F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods 10, 5-6 (2013).
66 Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76-82 (2011).
67 Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat Methods 9, 179-181 (2011).
68 Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284-1287 (2016).
69 Huang, L. et al. Genotype-imputation accuracy across worldwide human populations. Am. J. Hum. Genet. 84, 235-250 (2009).