Understanding Genetic Variability: Exploring Copy Number Variants through Non-Invasive Prenatal Testing in European Populations

doi:10.21203/rs.3.rs-3144965/v1

Download PDF

Research Article

Understanding Genetic Variability: Exploring Copy Number Variants through Non-Invasive Prenatal Testing in European Populations

https://doi.org/10.21203/rs.3.rs-3144965/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

CNVs are structural alterations in the genome that involve the duplication or deletion of DNA segments, contributing to genetic diversity and playing a crucial role in evolution and development of various diseases and disorders. Massively parallel sequencing (MPS) has revolutionized the field of genetic analysis and contributed significantly to routine clinical diagnosis and screening. It offers a precise method for detecting copy number variations (CNVs) with exceptional accuracy. In this context, non-invasive prenatal test (NIPT) based on the sequencing of cell-free DNA (cfDNA) from pregnant women's plasma using a low coverage whole genome MPS (WGS) approach represents a valuable source for population studies. Here, we analyzed genomic data of 12 732 pregnant women from the Slovak (9 230), Czech (1 583), and Hungarian (1 919) populations. We identified 5 062 CNVs ranging from 200 kbp and described its basic characteristics and differences between the subject populations. Our results suggest that re-analysis of sequencing data from routine WGS assays has the potential to obtain CNVs population frequencies, and may provide valuable information to support classification and interpretation of this type of genetic variation.

copy number variation

whole genome sequencing

non-invasive prenatal testing

population study

CNV frequency comparison

Prenatal testing has undergone a prolonged development from the traditional invasive methods such as amniocentesis or chorionic villus sampling [1]. Since the discovery of cell-free fetal DNA (cffDNA) in the maternal plasma, non-invasive prenatal testing (NIPT) has been integrated into clinical practice and has become a standard practice in developed countries. In some countries, these tests are already being implemented into public prenatal care. In the Netherlands, NIPT became available in 2014 as part of the TRIDENT-1 study for pregnant women at increased risk of common trisomies [2]. Subsequently, the TRIDENT-2 study was launched in 2017 with the aim of offering NIPT as the first-tier test for all pregnant women [3].

Most of the current NIPT approaches are based on low coverage whole genome sequencing (WGS) of DNA from the blood plasma of pregnant women. In this way, a chromosomal ploidy can be determined [4], and the technique proved to bring reliable results in detecting trisomies and other fetal chromosomal abnormalities [3, 5]. However, several extensions also allow the detection of subchromosomal aberrations, such as microdeletions and microduplications [6]. This type of genetic variation, also known as copy number variants (CNV), results from the loss or amplification of DNA segments ranging from 50 bp to tens of Mb. It has previously been shown to be a common part of the human genome [7, 8] and participates in population diversity [9, 10]. Furthermore, CNVs play an important role in evolution, contributing to the development of various diseases, influencing different biological processes that affect morphological variability, affecting the host-microbiome interaction or susceptibility to infection [11].

Clinical tests such as NIPT are mostly focused on the genetic analysis of the fetus. However, maternal DNA is also analyzed, which offers additional data for further supporting analyses. Individuals who have undergone NIPT represent a minimally limited (women of reproductive age), but still relatively large sample of the adult female population. Therefore sequencing data could be a valuable source for population studies. This proposal is based on our previous work where we proposed NIPT as a source of population-specific allelic frequencies [12] and on subsequent work where the potential of CNV ≥ 600 kbp in the Slovak female population was shown (Pös et al., 2019a). In this study, we have focused on comparing even smaller variants, CNV ≥ 200 kbp, in pregnant women from Slovakia, Hungary, and Czechia. We demonstrated that without additional financial investments in laboratory preparations, this approach provides the potential to obtain the population frequencies of large-scale CNVs. Our research broadens the general knowledge of this type of human genetic variability, which is currently poorly studied. Consequently, maternal genomic data obtained from NIPT can offer valuable information for researchers, laboratory diagnosticians, and clinical genetics since the CNVs could be used as supporting evidence for the classification and interpretation of other variant findings.

2.1 Cohort specification

We have analyzed sequencing data of 12 732 women undergoing NIPT after the tenth week of pregnancy. The data were provided by TRISOMYtest Ltd., which has been responsible for sample processing and sequencing analysis. Enrolled individuals are representatives of Slovak (9 230), Czech (1 583), and Hungarian (1 919) populations. The median age of the cohort is 35, ranging from 18–51 years. Data were collected between 2016 and 2021.

2.2 Sample preparation

Plasma samples of pregnant women were collected and processed for analysis by the protocol that has been described in our previous work [13]. The WGS data were generated by the Illumina NextSeq 500/550 platform as a part of routine NIPT. Since the subject of the presented work was mainly the re-analysis of sequencing data, the protocol for processing biological samples will not be discussed in more detail. It was suggested that sample handling and data analysis contributed significantly to the previously reported excess of population-stratified variants [14]. Thus, we eliminated heterogeneity in sample processing between laboratories as much as possible, so only samples processed by the same protocol and sequenced on the same type of equipment were included in the following analyses.

2.3 CNV identification

Sequencing reads were aligned to the reference genome GRCh37 using the Bowtie2 algorithm [15]. Subsequently, the coordinates were converted to GRCh38 with CrossMap tool [16]. If the conversion was not a single region (rare cases), we discarded converted regions on chromosomes other than the source chromosome and regions shorter than 200 bp and the resulting regions were joined to one consecutive region. We used only information for the initial position of the mapped reads, while only reads with mapping quality ≥ 40 have been stored. Since the exact sequence, read quality, and mapping quality do not enter further analyzes, we did not store this data to save disk capacity. To identify microaberrations, reads were grouped to bins with a size of 20 kbs. Then a two-step normalization was employed: 1) LOESS-based correction to eliminate GC-bias [17] and 2) PCA normalization to remove higher-order population artifacts on autosomes [18]. In order to increase the accuracy of the results, it was necessary to filter out sequences that are a common source of errors, most of which are poorly mapped regions near the centromeres. Finally, the genome coverage signal was split into regions with equal level using the circular binary segmentation algorithm from the R package DNAcopy [19] and segments with abnormal copy numbers have been identified. Due to the detection capability of the methodology used, the lower limit for the identification of maternal CNVs was set to 200 kbp, considering only segments with at least 60% signal increase/decrease compared to the reference [6].

Due to a certain degree of uncertainty in the detection algorithm [20], we had to deal with CNVs displaying nearly the same coordinates but were not considered the same CNV event. Thus, CNVs differing in two (unfiltered) bins (40 000 bp) at the start and/or end coordinates were considered equal. Then we assigned the same CNVs from different populations to each other and determined whether there was a significant difference in their representation between populations. The resulting list of CNVs with allelic frequency over 1% (7 gains, 8 losses) is shown in Supplementary Tables 3 and 4.

2.4 Statistical analysis

Python library pandas were used for data analysis [21]. The significance of our findings was evaluated using statistical tests implemented in the Python SciPy package [22]. Charts were created using Python Plotly graphing library (Inc., P.T., 2015. Collaborative data science. Available at: https://plot.ly). The Chi-square test was used to determine the significance of differences between populations for all the following statistical analyses, including numbers, distributions and overlaps of CNVs.

Our CNV calling pipeline has identified 5 062 CNVs ranging from 200 kbp to 75 260 kbp (median size 320 kbp). Altogether, 4 042 individuals (31,19%) present variation, of which 79,56% carried only one CNV, and 23,44% were carriers at least two CNVs. Moreover, one woman from the Slovak population has shown a presence of even 32 CNVs suggesting genomic instability. The gains to losses ratio was approximately 2.5:1 in all the populations (Table 1).

Table 1

Data summary for individual populations of pregnant women undergoing NIPT analysis.
Population	Samples	Samples with at least 1 CNV	CNVs	Gains	Losses
Slovak	9 230	2 900	3 585	2 578 (72%)	1 007 (28%)
Czech	1 583	510	622	460 (74%)	162 (26%)
Hungarian	1 919	632	855	611 (71%)	244 (29%)
Sum	12 732	4 042	5 062	3 649 (72%)	1 413 (28%)

Excluding the sex chromosome X, the sixth chromosome contained the most gains, exactly 11.6%, 10.7%, and 10.8% of all found gains, in the Slovak, Czech and Hungarian populations, respectively. On the other hand, the highest count of losses was observed on the chromosome seven for all three populations (Slovak 10.0%, Czech 14.2% and Hungarian 12.3%). With a few exceptions, the overall count of CNVs decreased with the length of the chromosomes (Fig. 1a, Supplementary Table 1). In order to find out the length distribution of the variants, we divided them into size ranges of 100 kbp. The most frequent size of CNVs was 200 kbp to 500 kbp, this range contained around 70–85% of all the CNVs. Larger CNVs were rare and their count decreased with the increasing size (Fig. 1b, Supplementary Table 2).

By comparing the distributions of CNV distances, either to chromosomal ends or to centromeric regions, we found CNVs overrepresented close to telomeres and centromeres (Fig. 2). The average frequency of CNVs per one Mbp of random genome sequence was 0.041%, while the average CNV frequencies within 1 Mbp proximal to centromere and telomeres, were 8.48% and 7.70%, respectively (Table 2). However, it can result from a combination of technical and biological effects since the detection method provides reduced precision in regions with low mappability, which usually include regions near centromere and chromosome ends.

Table 2

Average CNV frequency within 1 Mbp of different genomic regions.
Region	Slovak	Czech	Hungarian	Average
1 Mbp of random haploid genome sequence	0.040%	0.038%	0.044%	0.041%
1 Mbp proximal to centromere	5.76%	7.04%	12.66%	8.48%
1 Mbp proximal to telomeres	7.31%	7.41%	8.39%	7.70%

Using the Chi-square test we compared population differences in the count of CNVs on all chromosomes, we found a statistically significant difference in the CNVs gains (p-value = 0.0113). A comparison of the individual population pairs showed a significant difference between Slovak and Hungarian populations (p-value = 0.0396 from Chi-square test). However, when comparing population differences in the count of CNVs on individual chromosomes, we did not find any significant difference after Bonferroni correction (0.05/23 = 0.0022).

We found a statistically significant difference in CNV length distribution between populations (p-value = 8.69x10^− 14) when we compared the count of CNV gains in individual length ranges (Fig. 1b). The individual population pairs comparison showed a significant difference between Slovak and Hungarian populations (p-value = 8.88x10^− 16). When we compared population differences in each individual population CNVs length range pairs we found a significant difference between Slovak and Hungarian population in length range 200–300 kbp (p-value = 0.000315), 3–4 Mbp (p-value = 1.86x10^− 18) and 4–5 Mbp (p-value = 0.000225) and Czech and Hungarian population in length range 3–4 Mbp (p-value = 0.000758), all after Bonferroni correction (0.05/23 = 0.002). We did not find any significant population difference in the count of CNV losses in all and individual length ranges.

We continued by searching the most prevalent CNVs in the population, specifically those with a frequency exceeding 1%, that can be considered copy number polymorphisms [23]. We found 7 gains and 8 losses, which showed allelic frequency ≥ 1% in at least one population (Supplementary Table 3). When we compared these variants with publicly available database gnomAD SVs v2.1 (European) [24], we found no comparable range in six cases (gains: 8:2340000–2580000; 15:32020000–32420000; 22:22280000–22580000; losses: 7:64680000–64900000; 9:11840000–12200000; 15:22760000–23080000 (Supplementary Table 4). After applying automated ACMG guidelines available at https://genovisio.com, 8 variants were classified as variants of uncertain significance (VUS) that have no known clinical relevance and 7 variants were benign. According to the ISV tool [25] 4 variants were VUS and 11 variants were benign. Using the artificial intelligence integrated in the X-CNV predictive tool [26], we identified 10 variants as benign, 2 as likely benign, 2 as VUS and 1 as pathogenic. In 7 variants, prediction matched in all three tools (Supplementary Table 4).

Considering the counts of variants between populations, we found a difference in representation of variants 8:2260000–2640000 (p = 2.18x10^− 8), 8:2340000–2580000 (p = 2.29x10^− 13), and 12:20960000–21400000 (p = 1.63x10^− 3; statistically significant after Bonferroni correction; Supplementary Table 3). These CNVs were not present in at least one population, so we considered their occurrence to be zero in the given population. When comparing such CNVs only between the two populations with non-zero counts, we observed a different representation of 12:20960000–21400000 (SK-HU, p = 0.00168) (Supplementary Table 3).

Since CNVs can overlap different genomic regions, we explore the representation of protein-coding genes, long non-coding RNAs (lncRNA), and microRNAs (miRNAs) in our cohorts. Coordinates for individual genomic regions, also known as biotypes, were downloaded from the Ensembl genome database [27]. Subsequently, according to Woodwark and Bateman [28] three types of overlaps in each biotype have been defined (Fig. 3): I.) biotype in CNV (CNVs that entirely encompass the genomic region), II.) biotype partially overlapped the CNV (the start position of CNV is upstream, while the end position is included within the given region or the start position of CNV is included within, while the end position is downstream of the given region), and III.) CNV in biotype (genomic region that entirely encompasses the CNV). The total sum of all CNV-biotype overlaps in the studied populations for gains and losses is shown in the Table 3.

Table 3

The ratio of CNV-biotype overlaps in a given population for gains and losses.
Biotype	Gains			Losses
Biotype	Slovak	Czech	Hungarian	Slovak	Czech	Hungarian
gene*	30.59%	32.56%	26.84%	11.74%	7.82%	6.69%
lncRNA	57.42%	64.90%	46.90%	25.03%	20.99%	18.40%
miRNA	0.003%	0.003%	0.003%	0.001%	0.001%	0.000%
*gene represents protein-coding sequences, including both exons and introns; lncRNA - long non-coding RNA; miRNA - microRNA.

On average, 39% of CNV sequences overlap protein-coding genes, while 30% fall on gains and 9% on losses. Moreover, more than half of all CNV sequences (aver. 78%) overlapped lncRNA (56% of gains, 22% of losses). On the other hand, CNV-miRNA overlaps were near zero since miRNAs constitute a small portion of the genome. Every type of CNV-biotype overlap calculated separately is listed in Supplementary Table 5 and Supplementary Fig. 1.

The MPS method has become an integral part of prenatal care in recent years, as it allows for non-invasive prenatal screening of fetal aneuploidies and structural aberrations. However, clinical assays such as NIPT are mostly single-purpose focused on fetal genetic analysis, although this approach provides a wealth of data from maternal DNA that can be used for other supporting analyzes. Here, we propose additional possibilities for the use of genomic data generated by routine NIPT screening based on cfDNA sequencing from the plasma of pregnant women using a WGS approach.

Patients who have undergone NIPT represent a sample of the population, so their genomic data can be valuable for population studies. This is particularly relevant in countries where NIPT has been implemented in public prenatal care, such as the Netherlands [29]. On the samples of Slovak, Czech, and Hungarian pregnant women, we have shown that without additional investment in laboratory consumables, NIPT has the potential to obtain population frequencies of large-scale CNVs. Our findings could help to understand this important type of human genetic variability, as it is a poorly studied genetic phenomenon.

A negative correlation between the length and the number of CNVs through all populations have been shown to be consistent with previous studies [14, 30]. Since shorter CNVs are less likely to hit a critical region, they are not subjected to such a strong selection as large-scale CNVs. It is also known that losses are more deleterious to the genome, compared to the CNV gains [31]. Accordingly, the overall gain/loss ratio was in favor of gains in all the populations. Although large-scale CNVs are common in normal individuals, the length and the type (gain/loss) of aberrations seems to be one of the most limiting factors reflecting the deleterious effect of CNVs on the viability of individuals.

CNVs were not uniformly distributed on chromosomes between populations (Chi-square test p = 0.0031) with (Chi-square test p = 0.042). However, since we tested multiple hypotheses (23 chromosomes) the difference was not significant after Bonferroni adjustment (Bonferroni-corrected p = 0.0022). On the other hand, the distribution of CNVs on chromosomes differ when compared with a previous study evaluating CNVs ≥ 600 kbp, suggesting that CNVs of different lengths preferentially occupy certain chromosomes. This could be related to gene density and type of genomic elements, as they are expected to be under different degrees of constraint for variation in copy number [9, 32]. The overall distribution of CNVs was not uniform through the chromosomes, but CNVs were enriched in telomere and centromere proximal regions. These findings support the previous studies showing CNVs near centromeres and telomeres more frequently than expected by chance [33, 34]. The length distribution of large-scale gains also differs between populations, while the Hungarians have shown to be the most different in our cohorts.

We found copy number polymorphism (defined as variant with allelic frequency ≥ 1%) [23], which seems to be Slovak population specific gain of 8:2260000–2640000 (Supplementary Table 3). No comparable record for this region in gnomAD database has been found. Although, the CNV overlap no protein-coding genes and was predicted benign, it spans 59 regulatory elements and 7 lncRNA sequences with potential biological functions. The loss of chr15:22760000–23080000 was frequent CNV overlapping a morbid gene NIPA1 associated to hereditary spastic paraplegia. It was shown that the NIPA1 inhibits bone morphogenic protein signaling, which is critical for the regulation of synaptic growth and axonal microtubules [38]. Thus, NIPA1 loss-of-function may lead to defects in synapse and axon development [39]. Such examples demonstrate that a routine NIPT test can provide health-related information not only for the fetus but also for the mother and other potential offspring. However, in this context, ethical questions that should be discussed arise [29].

CNVs can affect gene expression through complex mechanisms that extend beyond gene dosage effects. Underlying mechanisms include insertions and deletions of regulatory regions and alterations of physical proximity of genes and regulatory elements [40], which are pivotal in important biological pathways with consequences in evolution, population genetics, epigenetics, gene function, and human phenotype [41, 42]. Since miRNA precursors are usually a tenths of nucleotides long, they are too short to include large-scale CNV. On the other hand, all affected miRNAs were fully encompassed in CNVs, potentially possessing miRNA dosage aberration. Considering the role of miRNA in post-transcriptional silencing of gene expression, such CNVs may affect essential physiological processes. Some genes and lncRNAs are long enough to include CNV ≥ 200 kbp, so the minor proportion of overlaps were of this type. However, due to large-scale, the majority of CNVs completely surrounded the sequence of particular biotype (22% for genes and 56% for lncRNA). lncRNAs were shown to be involved in the epigenetic regulation of allelic expression, in post-transcriptional gene regulation, and they may act as scaffolds for protein complexes or precursors for small non-coding RNAs [43]. Since lncRNA plays such important roles, their alterations can affect human metabolism or contribute to the development of pathologies. Although many non-coding RNA sequences have been discovered, their functions remain poorly explored, thus it is not possible to reliably conclude the impact of most overlapping CNVs on the physiology of individuals. Nevertheless, knowledge of population genetic studies has significantly influenced current medicine and has proven to be important for understanding our genome or clarifying its role in disease development [44–46]. Mapping the regions that can be deleted from the human genome without apparent phenotypic consequences is of great benefit for the interpretation of new CNV findings for both clinical and research applications [9]. Following the expansion of CNV analysis in clinical laboratories, these resources will be an invaluable aid to researchers, laboratory diagnostics, and clinical geneticists in structural variant classification.

We have shown several differences between populations at the large-scale CNVs, however our study suffers from a couple of limitations. Compared populations were geographically closely related, so the population differences could be blurred due to genetic cross between populations over the years. Despite the effort for consistency between the laboratories that provided us with data, we cannot rule out some differences in sample manipulation that is an important factor affecting the cfDNA analysis [47]. However, these should not affect maternal CNV representations since our method has provided high robustness and reliability for such a purpose [20]. The samples obtained within the laboratory from the studied populations should represent, to some extent, the structure of the population of interest. However, samples from different ethnic groups could also be included, and thus the percentage of variability in individual populations could be apparently increased. So, the information on the ethnicity of patients undergoing the routine test could add value to further such population studies. Moreover, the data are subject to anonymization with no information on the health status of the individual, so we were not able to relate a patient phenotype to the supposed consequences of CNVs. If the patients were asked to provide at least basic anamnestic and demographic data, it could help add valuable insights into ambiguous variants.

Our results suggest that reanalysis of sequencing data from routine low coverage WGS has the potential to obtain population frequencies of larger-scale CNV with no need for additional funds for laboratory sample processing. We conclude that basic anamnestic and demographic data subjected to anonymization could significantly increase the value of such population studies and add valuable insights to support classification of ambiguous variants. Nevertheless, this approach can provide information to help laboratory diagnosticians and clinical geneticists to interpret large-scale CNV.

Ethics approval and consent to participate

All the enrolled participants gave written informed consent for inclusion in this study. The study was conducted following the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the Bratislava Self-Governing Region on 30 June 2015 (03899/2015/HF), 25 March 2020 (05006/2020/HF/2) and 17 January 2023 (4530/2023/HF).

Consent for publication

Consent for publication was obtained from all subjects involved in the study.

Availability of data and materials

The datasets for this study (both input files, scripts, and output files) can be found in the GitHub:https://github.com/marcelTBI/CNV_population_study.

Competing interests

Zuzana Holesova, Ondrej Pös, Juraj Gazdarica, Marcel Kucharik, Jaroslav Budis, and Tomas Szemes are the employees of Geneton Ltd. which is involved in numerous research and development efforts dedicated to adapting new technologies to better understand genomic data and facilitate their implementation in effective and reliable patient care. Michaela Hyblova and Gabriel Minarik are employees of TRISOMYtest Ltd. and Medirex Group Academy n.o.

Funding

This research was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 956229 (ALPACA) and No 872539 (PANGAIA). The funding was also provided by the Slovak Research and Development Agency grant APVV-21-0296 (INCAM) and by the Operational Programme Integrated Infrastructure for the project ITMS: 313011ATL7 (PanClinCov), ITMS: 313021BUZ3 (USCCCORD) and ITMS: 313011AVH7 (DiaCovid), co-financed by the European Regional Development Fund.

Author’s contributions

Conceptualization, O.P., J.B., J.G. and T.S.; methodology, Z.H.; software, M.K.; validation, J.G.; formal analysis, Z.H. and J.G.; investigation, Z.H. and O.P.; resources, G.M., M.H.; data curation, M.K.; writing—original draft preparation, O.P.; writing—review and editing, O.P.; visualization, Z.H.; supervision, J.B.; project administration, J.B.; funding acquisition, T.S. All authors have read and agreed to the published version of the manuscript.

Acknowledgements

We would like to thank all of the participating centers who submitted samples used in this study. Additionally, we’d like to thank the patients who agreed to participate. We appreciate the support of the laboratory staff for their contributions.

Bringman JJ. Invasive prenatal genetic testing: A Catholic healthcare provider’s perspective. Linacre Q. 2014;81:302–13.
Oepkes D, Page-Christiaens GCL, Bax CJ, Bekker MN, Bilardo CM, Boon EMJ, et al. Trial by Dutch laboratories for evaluation of non-invasive prenatal testing. Part I-clinical impact. Prenat Diagn. 2016;36:1083–90.
van der Meij KRM, Sistermans EA, Macville MVE, Stevens SJC, Bax CJ, Bekker MN, et al. TRIDENT-2: National Implementation of Genome-wide Non-invasive Prenatal Testing as a First-Tier Screening Test in the Netherlands. Am J Hum Genet. 2019;105:1091–101.
Chitty LS, Lo YMD. Noninvasive Prenatal Screening for Genetic Diseases Using Massively Parallel Sequencing of Maternal Plasma DNA. Cold Spring Harb Perspect Med. 2015;5:a023085.
Gazdarica J, Budis J, Duris F, Turna J, Szemes T. Adaptable Model Parameters in Non-Invasive Prenatal Testing Lead to More Stable Predictions. Int J Mol Sci [Internet]. 2019;20. Available from: http://dx.doi.org/10.3390/ijms20143414
Kucharik M, Gnip A, Hyblova M, Budis J, Strieskova L, Harsanyova M, et al. Non-invasive prenatal testing (NIPT) by low coverage genomic sequencing: Detection limits of screened chromosomal microdeletions. PLoS One. 2020;15:e0238245.
Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, et al. Detection of large-scale variation in the human genome. Nat Genet. 2004;36:949–51.
Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, et al. Large-scale copy number polymorphism in the human genome. Science. 2004;305:525–8.
Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat Rev Genet. Nature Publishing Group; 2015;16:172–83.
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–54.
Pös O, Radvanszky J, Buglyó G, Pös Z, Rusnakova D, Nagy B, et al. DNA copy number variation: Main characteristics, evolutionary significance, and pathological aspects. Biomed J. 2021;44:548–59.
Budis J, Gazdarica J, Radvanszky J, Harsanyova M, Gazdaricova I, Strieskova L, et al. Non-invasive prenatal testing as a valuable source of population specific allelic frequencies. J Biotechnol. 2019;299:72–8.
Hyblova M, Harsanyova M, Nikulenkov-Grochova D, Kadlecova J, Kucharik M, Budis J, et al. Validation of Copy Number Variants Detection from Pregnant Plasma Using Low-Pass Whole-Genome Sequencing in Noninvasive Prenatal Testing-Like Settings. Diagnostics (Basel) [Internet]. 2020;10. Available from: http://dx.doi.org/10.3390/diagnostics10080569
Itsara A, Cooper GM, Baker C, Girirajan S, Li J, Absher D, et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am J Hum Genet. 2009;84:148–61.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9.
Zhao H, Sun Z, Wang J, Huang H, Kocher J-P, Wang L. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics. 2014;30:1006–7.
Liao C, Yin A-H, Peng C-F, Fu F, Yang J-X, Li R, et al. Noninvasive prenatal diagnosis of common aneuploidies by semiconductor sequencing. Proc Natl Acad Sci U S A. 2014;111:7415–20.
Zhao C, Tynan J, Ehrich M, Hannum G, McCullough R, Saldivar J-S, et al. Detection of fetal subchromosomal abnormalities by sequencing circulating cell-free DNA from maternal plasma. Clin Chem. 2015;61:608–16.
DNAcopy [Internet]. Bioconductor. [cited 2022 Jun 22]. Available from: http://bioconductor.org/packages/DNAcopy/
Kucharík M, Budiš J, Hýblová M, Minárik G, Szemes T. Copy Number Variant Detection with Low-Coverage Whole-Genome Sequencing Represents a Viable Alternative to the Conventional Array-CGH. Diagnostics (Basel) [Internet]. 2021;11. Available from: http://dx.doi.org/10.3390/diagnostics11040708
The pandas development team. pandas-dev/pandas: Pandas [Internet]. Zenodo; 2023. Available from: https://zenodo.org/record/3509134
Singh A. Review of “SciPy 1.0: fundamental algorithms for scientific computing in Python” [Internet]. 2021. Available from: http://dx.doi.org/10.14293/s2199-1006.1.sor-life.a7056644.v1.rysreg
Copy Number Variation and Human Disease [Internet]. [cited 2022 Jun 20]. Available from: https://www.nature.com/scitable/topicpage/copy-number-variation-and-human-disease-741737/
Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, et al. A structural variation reference for medical and population genetics. Nature. 2020;581:444–51.
Gažiová M, Sládeček T, Pös O, Števko M, Krampl W, Pös Z, et al. Automated prediction of the clinical impact of structural copy number variations. Sci Rep. 2022;12:555.
Zhang L, Shi J, Ouyang J, Zhang R, Tao Y, Yuan D, et al. X-CNV: genome-wide prediction of the pathogenicity of copy number variations. Genome Med. 2021;13:132.
Cunningham F, Allen JE, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, et al. Ensembl 2022. Nucleic Acids Res. 2022;50:D988–95.
Woodwark C, Bateman A. The characterisation of three types of genes that overlie copy number variable regions. PLoS One. 2011;6:e14814.
Pös O, Budiš J, Szemes T. Recent trends in prenatal genetic screening and testing. F1000Res [Internet]. 2019;8. Available from: http://dx.doi.org/10.12688/f1000research.16837.1
Pös O, Budis J, Kubiritova Z, Kucharik M, Duris F, Radvanszky J, et al. Identification of Structural Variation from NGS-Based Non-Invasive Prenatal Testing. Int J Mol Sci [Internet]. 2019;20. Available from: http://dx.doi.org/10.3390/ijms20184403
Sudmant PH, Mallick S, Nelson BJ, Hormozdiari F, Krumm N, Huddleston J, et al. Global diversity, population stratification, and selection of human copy-number variation. Science. 2015;349:aab3761.
Hao Z, Lv D, Ge Y, Shi J, Weijers D, Yu G, et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput Sci. 2020;6:e251.
Nguyen D-Q, Webber C, Ponting CP. Bias of selection on human copy-number variants. PLoS Genet. 2006;2:e20.
Monlong J, Cossette P, Meloche C, Rouleau G, Girard SL, Bourque G. Human copy number variants are enriched in regions of low mappability. Nucleic Acids Res. 2018;46:7236–49.
Hoang D, Sue GR, Xu F, Li P, Narayan D. Absence of aneuploidy and gastrointestinal tumours in a man with a chromosomal 2q13 deletion and BUB1 monoallelic deficiency. BMJ Case Rep [Internet]. 2013;2013. Available from: http://dx.doi.org/10.1136/bcr-2013-008684
Ajeawung NF, Nguyen TTM, Lu L, Kucharski TJ, Rousseau J, Molidperee S, et al. Mutations in ANAPC1, Encoding a Scaffold Subunit of the Anaphase-Promoting Complex, Cause Rothmund-Thomson Syndrome Type 1. Am J Hum Genet. 2019;105:625–30.
Evans DR, Green JS, Johnson GJ, Schwartzentruber J, Majewski J, Beaulieu CL, et al. Novel 25 kb Deletion of MERTK Causes Retinitis Pigmentosa With Severe Progression. Invest Ophthalmol Vis Sci. 2017;58:1736–42.
Tsang HTH, Edwards TL, Wang X, Connell JW, Davies RJ, Durrington HJ, et al. The hereditary spastic paraplegia proteins NIPA1, spastin and spartin are inhibitors of mammalian BMP signalling. Hum Mol Genet. 2009;18:3805–21.
Blauw HM, van Rheenen W, Koppers M, Van Damme P, Waibel S, Lemmens R, et al. NIPA1 polyalanine repeat expansions are associated with amyotrophic lateral sclerosis. Hum Mol Genet. 2012;21:2497–502.
Gamazon ER, Stranger BE. The impact of human copy number variation on gene expression: Figure 1 [Internet]. Briefings in Functional Genomics. 2015. p. 352–7. Available from: http://dx.doi.org/10.1093/bfgp/elv017
de Smith AJ, Walters RG, Froguel P, Blakemore AI. Human genes involved in copy number variation: mechanisms of origin, functional effects and implications for disease. Cytogenet Genome Res. 2008;123:17–26.
Zhong Q, Lu M, Yuan W, Cui Y, Ouyang H, Fan Y, et al. Eight-lncRNA signature of cervical cancer were identified by integrating DNA methylation, copy number variation and transcriptome data. J Transl Med. 2021;19:58.
Szilágyi M, Pös O, Márton É, Buglyó G, Soltész B, Keserű J, et al. Circulating Cell-Free Nucleic Acids: Main Characteristics and Clinical Application. Int J Mol Sci [Internet]. 2020;21. Available from: http://dx.doi.org/10.3390/ijms21186827
Carrasco-Ramiro F, Peiró-Pastor R, Aguado B. Human genomics projects and precision medicine. Gene Ther. 2017;24:551–61.
Beyene J, Pare G. Statistical genetics with application to population-based study design: a primer for clinicians. Eur Heart J. 2014;35:495–500.
Valsesia A, Macé A, Jacquemont S, Beckmann JS, Kutalik Z. The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation. Front Genet. 2013;4:92.
Pös Z, Pös O, Styk J, Mocova A, Strieskova L, Budis J, et al. Technical and Methodological Aspects of Cell-Free Nucleic Acids Analyzes. Int J Mol Sci [Internet]. 2020;21. Available from: http://dx.doi.org/10.3390/ijms21228634

Competing interest reported. Zuzana Holesova, Ondrej Pös, Juraj Gazdarica, Marcel Kucharik, Jaroslav Budis, and Tomas Szemes are the employees of Geneton Ltd. which is involved in numerous research and development efforts dedicated to adapting new technologies to better understand genomic data and facilitate their implementation in effective and reliable patient care. Michaela Hyblova and Gabriel Minarik are employees of TRISOMYtest Ltd. and Medirex Group Academy n.o.

Holesovaetal.2023SupplementaryMaterialrevision2.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Understanding Genetic Variability: Exploring Copy Number Variants through Non-Invasive Prenatal Testing in European Populations

Status:

Version 1

Abstract

Figures

1 Introduction

2 Materials and Methods

2.1 Cohort specification

2.2 Sample preparation

2.3 CNV identification

2.4 Statistical analysis

3 Results

4 Discussion

5 Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1