Newborn Screening in Unselected Children Using Genomic Sequencing

Background: The aim of this study is to investigate potentially curable or treatable medical conditions in unselected newborns using genomic sequencing(GS). Methods: 321 newborns from a cohort of pregnant women from Qingdao, China, underwent high-depth GS (average 47.42 fold), with the approval of the ethics committee. 61 Mendelian Diseases, 151 Primary Immunodeciency Diseases(PID) and 5 DPWG recommeded Essential pharmacogenetic(PGx) genes were analyzed. Results: 121 Mendelian pathogenic or likely pathogenic variants associated with 31 inherited diseases were detected, among these hearing loss, congenital hypothyroidism, methylmalonic acidemia, methylmalonic acidemia with homocystinuria, phenylketonuria(PKU) and benign hyperphenylalaninemia accounted for half of the carrier variants. Three children with compound heterozygous variants at GJB2 and PAH were conrmed by Sanger sequencing. Follow-up of the three families conrmed that one child was diagnosed with PKU and two children with GJB2 variants were scheduled to undergo hearing loss testing every six months after genetic counceling due to the nature of incomplete penetrance of hearing loss. 11 heterozygous pathogenic/ likely pathogenic variants in eight PID genes were identied in 11 infants. All 321 newborns carried at least one variant at the ve DPGW recommended PGx genes. Codeine and clopidogrel require more attention in giving prescription for 25% and 8% of newborns have a decreased function of CYP2D6 and CYP2C19 enzymes respectively. Conclusions: Our study is the largest to date using GS to sequence unselected newborns. The results suggest that using GS may be a suitable method for screening newborns for variants in a large number of disease associated genes. Wilson and Jungner criteria. Yet, ES and GS has already been performed on small cohorts of “healthy” newborn children (7, 8). The latter study identied actionable adult onset disease in 3.5% and actionable childhood onset disease in 9.4% of the tested children. Furthermore, relevant pharmacogenomic variants were identied in 5% of the newborns, supporting the concept of introducing ES/GS based screening of all newborns. The current study is the largest study to date using GS on unselected newborns, looking at a wide panel of inherited diseases, primary immunodeciency diseases and pharmacogenomically relevant variants. the we the of GS in or


Background
Newborn screening for metabolic diseases was initiated in the 1960-ties and was based on the early identi cation of children with phenylketonurea (Føllings disease) as a cause of childhood mental retardation (1). The development of an assay based on examination of a dried blood spot (DBS) on a lter paper in 1963 (Guthrie card) (2) formed the basis of a technically simple screening assay, allowing the identi cation and subsequent dietary treatment of affected children.
Extended screening for a large number of additional metabolic diseases, aided by the introduction of mass spectrometry methods, has been gradually implemented, albeit to a different extent in various countries (https://membership.isns-neoscreening.org/disorders/). The ethical guidelines published by Wilson and Jungner in 1968 (3), remains the gold standard for which diseases to screen for and the current list of recommended disorders in the US contains 35 core and 26 secondary conditions (RUSP/HHS) (https://www.hrsa.gov/advisory-committees/heritable-disorders/rusp/index.html) (3).
DNA based screening using the DBS has recently been introduced, and screening for cystic brosis is included in screening programs in selected countries.
Severe Combined Immunode ciency (SCID) (T cell lymphopenia) using quanti cation of T cell receptor excision circles (TREC) (4) has also been implemented, starting in the USA in 2010 (5). Screening for kappa receptor excision circles (KREC) (B cell lymphopenia in X-linked agammaglobulinemia (XLA)) followed in 2011 (6), and a combined assay is used in selected countries. Currently, testing for other diseases such as Spinal Muscular Atrophy (SMA) is being implemented and additional conditions are being considered for national genetic screening of newborns.
The NSIGHT initiative (NIH) 2012-2018 (https://www.genome.gov/Funded-Programs-Projects/Newborn-Sequencing-in-Genomic-Medicine-and-Public-Health-NSIGHT), aimed at large scale Newborn Sequencing in Genomic Medicine and Public Health using exome sequencing (ES), or genomic sequencing (GS). The supported projects were related both to data collection, identi cation of speci c disorders and ethical, legal and social implications of genomic sequencing of newborns.
Different methods for identi cation of mutated genes, including Targeted Region Sequencing (TRS), ES and GS, is currently standard procedure in newborn children with a suspected disease. However, sequencing of apparently healthy newborns remains controversial as it may violate the Wilson and Jungner criteria. Yet, ES and GS has already been performed on small cohorts of "healthy" newborn children (7,8). The latter study identi ed actionable adult onset disease in 3.5% and actionable childhood onset disease in 9.4% of the tested children. Furthermore, relevant pharmacogenomic variants were identi ed in 5% of the newborns, supporting the concept of introducing ES/GS based screening of all newborns.
The current study is the largest study to date using GS on unselected newborns, looking at a wide panel of inherited diseases, primary immunode ciency diseases and pharmacogenomically relevant variants.

Study Subjects
Each participant was provided with a report on the results of the genetic testing, a report interpretation and genetic counselling. Positive genetic test results will be followed up annually until the child is three years old.

Sample collection
Umbilical cord blood (5 ml) and umbilical cord (3 tubes, 1 cm per tube) were collected. When GS detection was carried out, umbilical blood DNA was preferred for detection. If the umbilical blood collection failed, umbilical cord DNA was extracted. The GS cohort contained DNA from 303 umbilical cord blood samples and 18 umbilical cords.

Processing of samples
Umbilical blood DNA was extracted with the HiPure Blood DNA Mini Kit (Magen, Guangzhou, China) whereas umbilical cord DNA was extracted with Saltingout Self-dispensing Kit. After DNA extraction, Qubit 3.0 uorometer (Life Technologies, Paisley, UK) was used to measure the DNA concentration, and an 2% agarose gel electrophoresis was used to detect DNA fragment integrity.

Sequencing
Extracted DNA subsequenctly underwent library construction and sequenced using the sequencing platform DIPSEQ of MGI (MGI, Shenzhen, China) with 100bp paired-end reads. Brie y, genomic DNA was normalized and processed for circularization (9). Genomic DNA was heat-denatured at 95 °C for 3 minutes to make a single strand DNA circle (ssDNA circle), which were then mixed reagents of MGIEasyTM DNA Library Prep Kit (MGI, Shenzhen, China) and incubated at 37 °C for 30 minutes to complete the circularization. The resulting ssDNA circle was then used to generate DNA nanoballs (DNBs) by rolling circle ampli cation (RCA) (10). After RCA and the formation of DNBs, the nal product was measured by Qubit using the ssDNA HS Assay kit (Invitrogen), and loaded on a DNBSEQ-500 platform (MGI, Shenzhen, China) for sequencing (11) following the manufacturer's instructions.

Analysis pipeline
The alternative contigs in GRCh38 assembly was deleted to improve the alignment accuracy, and BWA mem was used to align the read to the human reference genome (GRCh38 / UCSC hg38). The Genome Analysis Software Kit (GATK 4.0) best practice pipeline was used to perform variants calling including SNVs (Single nucleotide variants) and short InDels (insertions/deletions). After variation calling, bcftools was used to extract variation from 61 Mendelian Diseases (MD), 151 Primary Immunode ciency (PID) genes and 5 pharmacogenetic genes associated with severe adverse drug reactions. Subsequently all the samples were merged using bcftools merge. All the missing alleles were assumed to have no variant.

Quality control (QC)
In order to ensure the high-quality of the data for each sample, stringent quality control criteria were applied, which required the GC content of the sequencing read fell within 40%-44%, the average Q30 above 80%, the duplicated rate below 10%, the average depth above 20x, the percentage > = 4x coverage of NoNregions above 96%, the Ti/Tv ratio within 1.96-2.02 and the het/hom ratio within 1.3-1.7.

Gene lists
The 61 Mendelian Diseases related 109 RUSP annotated genes are shown in Supplementary Table 1. 151 genes suggested by the International Union of Immunologic Societies Expert Committee for Primary Immunode ciency (14,15), which are associated with the most severe conditions (including immunode ciencies affecting cellular and humoral immunity, combined immunode ciencies with associated or syndromic features and predominantly antibody de ciencies), were manually reviewed and used to explore the known PID genes and are shown in Supplementary Table 2, and the ve genes suggested by the Dutch Pharmacogenetics Working Group to be associated with adverse drug reactions are given in Supplementary Table 3. Each gene and its inheritance manually was reviewed according to OMIM or the published literature.

Validation of variants
Pathogenic or likely pathogenic variants of inherited metabolic diseases were veri ed by Sanger sequencing.

Quality control
The demographic data of 321 newborns are summarized in Table 1 Table 2). Thirtyone inherited diseases were asscociated with the 121 pathogenic and likely pathogenic variants, while Hearing Loss was the most common disease. Twentyone newborns carried more than two genetic variants (Supplementary Table 4). . Sanger sequencing con rmed that one variant was inherited from her/his mother. However, as infant father's sample was not available, we could not determine if the small deletion and insertion was inherited from the father or whether it was a de novo variant ( Supplementary Fig. 1). Follow-up of the three families con rmed that one child with compound heterozygosity in PAH has been diagnosed with PKU, while the other two children with GJB2 variation have not shown characteristics of hearing loss yet. Previous studies report that homozygous or compound heterozygous variants of c.109G > A are associated with light to mild deafness, and show incomplete penetrance, which can lead to late-onset deafness (16,17). After genetic counceling, the two children with GJB2 variants were therefore scheduled to undergo hearing testing every six months.

Primary immunode ciency diseases
The IUIS summary information of PID genes (15) was used to identify potential variants in 151 immunode ciency associated genes. Altogether 11 heterozygous pathogenic/ likely pathogenic variants in eight genes were identi ed in 11 of the 321 newborn children (Table 3). However, all of these variants were detected in heterozygous state, no child was found to carry homozygous or compound variants in the immunode ciencies genes that are recorded as being recessive gene in IUIS summary.  (20) with the goal to set up a guideline for which drugs testing of speci c genetic variants is warranted. The CIS is translated into a three-category recommendation for testing: Essential, Bene cial and Potentiallly Bene cial (Supplementary Table 4).
In this study, we only focused on the gene-drug pairs according to the DPWG Essential category. We observed that every newborn in the Qingdao cohort carried at least one clinically relevant variant (Fig. 1) of the Essential PGx genes. Among the gene-drug pairs, CYP2D6 had the highest variant carrying rate (Table 4), where 266 out of 321 infants carring at least one relevant variant. In total, 150 infants carried one copy of *10 (rs106585), while 81 infants carried two copies of *10, suggesting that at least 25% infants have a decreased function of CYP2D6 in Codeine metabolism (21,22). Gene CYP2C19 showed the second highest variation carrying rate, 209 out of 321 infants carring at least one clinically relevant variant. Newborns carrying homozygous variants at the CYP2C19 gene with subtypes *2/*2 and *3/*3 are 25 and 1 respectively, which would lead to lack of enzyme activity and a low metabolization of clopidogrel via the CYP2C19 pathway (23). In addition, 133 and 122 infants carried variants at UGT1A1 and NUDT15 respectively. Homozygous variants at UGT1A1 (24,25) and NUDT15 (26,27) result in a reduced metabolism of Irinotecan, Azathioprine, Mercaptopurine and Tioguanine. No clinical related variation was detected at DPYD (Table 4). Notes: The gene-drugs pairs refer to the DPWG "Essential" category. MAF data refers to 1000 Genome phase 3 dataset. NA indicates no available data from 1 dataset and can thus not be detected by the current pipeline.
We further investigated the differences in allele frequency between the Qingdao cohort dataset and ve subpopulations of the 1000 Genome dataset, including East Asians (EAS), South Asians (SAS), Africans (AFR), Europeans (EUR), and Americans (AMR). In most cases, the allele frequency of the Qingdao cohort is consistent with the EAS dataset, while the other four subpopulations differ signi cantly (Table 4). where "99.8% of participants had a genotype associated with increased risks to at least one medication" (30). Moreover, a retrospective analysis of 1000 Genome dataset on 120 pharmacogenomics genes across 26 global populations have reported a median of three clinical variants per individual, and East Asian topped super-populations with the highest percentage of loss-of-function variants (60.9%) (31). Therefore, the results suggest that using GS may be a suitable method for screening newborns for variants in a large number of disease-associated genes.
The Wilson-Jungner guidelines (3), has provided the golden standard for what conditions which should be screened for in newborns since 1968. These guidelines have served us well and state that only disorders amenable to treatment should be investigated. However, although not curable, Cystic Fibrosis has been included in the screening programs in several countries. Similarly, screening for Spinal Muscular Atrophy has recently been added to the RUSP list (5), in spite of the prohibitively expensive therapy needed to mitigate the disease which precludes its implementation in many low and middle income countries.
However, novel forms of therapy for hitherto incurably diseases, including SMA, is changing the indications for therapy and consequently, the newborn screening process. Thus, gene therapy promises to change the list of diseases which could/should be analyzed in the future. Furthermore, severe combined immunode ciency (SCID) has recently been added to the RUSP list and screening has been initiated in many countries worldwide, in spite of the fact that "incurable" diseases are identi ed during the process (including trisomy 21, Nijmegen Breakage Syndrome and Ataxia-Telangiectasia). The latter also raises the question what actually constitutes a curable/treatable medical condition. In the case of Ataxia-telangiectasia, correct identi cation allows prophylaxis against the accompanying infections (due to immunode ciency) but will not affect the neurological problems which will subsequently develop. Yet, parents are clearly in favor of being informed about the diagnosis (32), and the mitigating therapy available, even though curable treatment is currently not possible.
Previous studies using ES/GS on newborns have used an indiscriminate inclusion of OMIM de ned diseases (7,8). In our study, however, we have chosen a limited set of disorders including the currently recommended inherited disorders, selected primary immunode ciency disorders and genes associated with drug reactions (using a standardized set of genes for the latter based on the available guidelines listed in the Clinical Pharmacogenetics Implementation Consortium and the Dutch Pharmacogenetics Working Group), thus limiting the search for genetic variants with known clinical importance. Although the number of children who were affected by variants in the two rst categories was limited, every assessed child carried at one genetic variant potentially associated with adverse drug reactions, suggesting a major potential for improvement of personalized drug safety for children.
One important aspect when screening for disorders in a given population is to use a matched control database as variants can be highly speci c for a given ethnic group (33,34). Most databases published to date are based on individuals of European descent and many populations are are poorly represented. The Genome Asia 100K project (35) aims to address this gap by sequencing a large number of individuals from different Asian populations and can be used as a reference and is essential as an unexpectedly high allele frequency of a given variant may be highly restricted to a particular ethnic population.

Conclusion
In this study, we applied GS to sequence unselected newborns to investigate potentially curable or treatable medical conditions in 321 children from Qingdao. Selective identi cation of genetic data, where therapeutic options are available, does not violate the Wilson-Jungner criteria and also provides a basis for future research on variants of unknown signi cance in an expanding number of genes and should therefore be considered in future screening programs for all newborns.

Availability of data and material
The data that support the ndings of this study have been deposited into CNGB Sequence Archive (12) of CNGBdb with accession number CNP0001264(13).

Competing interests
The authors declare that they have no competing interests.