Pilot study of expanded newborn screening for 573 genes related to severe diseases in China: results from 1173 newborns

Background: Current newborn screening (NBS) in China is mainly aimed at detecting biochemical levels of metabolites in the blood, which may generate false positive/negative results. To explore whether next-generation sequencing (NGS) for dried blood spots can increase the detecting rate, we carried out a pilot study using NGS in 1,173 newborns who had been tested by traditional NBS. With a focus on inherited metabolic diseases (IMDs), our team investigated the current frequencies of genes related to common inherited metabolic diseases in this cohort. Methods: We designed an NGS panel of 573 genes related to severe diseases and performed NBS in 1,173 individuals who had been screened by tandem mass spectrometry (MS/MS) as well as for phenylalanine (Phe), thyroid-stimulating hormone (TSH), 17-α-hydroxyprogesterone (17-OHP), and glucose-6-phosphate dehydrogenase (G6PD) abnormalities in a traditional biochemical NBS conducted in September 2016. We compared the biochemical results to the genetic variants and investigated the carrier frequencies of 77 genes related to disorders by MS/MS in these newborns. Results: The biochemical results showed that four newborns (all male) were positive for G6PD by enzymatic assay, while the other biochemical findings including MS/MS, Phe, TSH and 17-OHP were negative. Genetic analysis results revealed that all the four newborns with positive G6PD values harbored hemizygous G6PD mutations. The NGS results also revealed an individual (ID 84123) carrying two SLC22A5 mutations (c.760C>T/p.R254* and c.1400C>G/p.S467C) common in Chinese patients with carnitine deficiency, which were later verified to be in trans, who was biochemically negative in 2016. The MS/MS results in 2019 showed free carnitine deficiency, consistent with the genetic analysis findings. The top five genes with the highest carrier frequencies in these newborns were PAH (1.77%), ETFDH (1.24%), MMACHC (1.15%), SLC25A13 (0.98%), NGS: next-generation sequencing; NBS: newborn screening; DM: disease-causing mutation; MS/MS: tandem mass spectrometry; Phe: phenylalanine, TSH: thyroid-stimulating hormone; 17-OHP: 17-α-hydroxyprogesterone; G6PD: glucose–6-phosphate dehydrogenase; IMDs: inherited metabolic diseases; DBS: dried blood spots; SPCD: systemic primary carnitine deficiency.

Conclusions: Our study provided data combing biochemical results with genetic variants in 1,173 newborns and confirmed a primary carnitine deficiency patient with false-negative biochemical results. This is also the first study to report the carrier frequencies of 77 IMDcausing genes in China.

Background
Newborn screening (NBS), a successful public health program that prevents morbidity and mortality through early detection and management of specific conditions, can improve the lives of children by allowing timely therapeutic interventions. The screening of newborns for inherited metabolic diseases (IMDs) and other common genetic disorders is now a routine component of neonatal care in many countries now. Tandem mass spectrometry (MS/MS) is widely used to detect metabolites related to amino acid metabolism, organic acid metabolism and fatty acid oxidation in dried blood spots (DBS). MS/MS is Currently the principal method used in NBS centers to detect biochemical metabolites (1). In China, apart from MS/MS testing, traditional biochemical NBS also includes tests for phenylalanine (Phe), thyroid-stimulating hormone (TSH), 17-α-hydroxyprogesterone , and glucose-6-phosphate dehydrogenase (G6PD) enzymatic analysis.
The advent of next-generation sequencing (NGS) technologies has enabled the simultaneous and high-throughput detection of gene variants. The rapid development of NGS has reduced the cost of sequencing and genetic analysis such that NGS can now be used in NBS. Genome-wide sequencing in the US has been discussed and applied in 159 newborns (2,3), the results of which suggested that newborn genomic sequencing can effectively detect the risk and carrier status for a wide range of disorders that are not detectable by current NBS assays (4). Genomic sequencing of newborns can help avoid diagnostic odysseys in ill newborns, allow optimal early treatment strategies for affected newborns (5,6), and identify carrier status that could help in future reproductive planning. Despite the anticipated benefits, the major challenges hindering its application include data interpretation and appropriate reporting of information. In our previous research, we designed a panel of 77 genes related to over 40 IMDs and demonstrated that genomic DNA extracted from DBS could generate reliable sequencing data, enabling NGS to function as a second-tier diagnostic test in NBS (7).
Considering these benefits and challenges, we launched a project to screen newborns for genes related to severe diseases. A total of 573 genes related to birth defects were selected based on penetrance, age of onset, and the severity of the causal diseases. A total of 1,173 newborns born in September 2016 were randomly enrolled in the NBS center of Xinhua Hospital in Shanghai. The newborns had all been screened by MS/MS and for Phe, TSH, 17-OHP, and G6PD in traditional biochemical NBS. We compared the biochemical results and genetic variants obtained using NGS in these 1173 newborns.
Previous studies have demonstrated that the frequencies of Mendelian disorders in China vary according to the diverse ethnic backgrounds and areas (8). Most of the newborns recruited in this study were born in Shanghai (1097, 93.52%), and most belonged to Han ethnicity, which is the largest ethnic group in China. IMDs, a group of rare diseases with an overall incidence of 1/500 in all ethnic cohorts, are caused by enzyme or transporter abnormalities and alterations in biochemical pathways for metabolism (9). IMDs are genetically heterogeneous disorders and are frequently transmitted in an autosomal recessive inheritance pattern and patients with IMDs may present severe symptoms. The present study investigated the carrier frequencies of IMDs related to amino acid metabolism, organic acid metabolism, and fatty acid oxidation in these randomly selected newborns.

Subjects
This study included 1,173 residual DBS specimens from newborns (600 males and 573 females) at Xinhua Hospital in Shanghai. All 1,173 newborns were randomly selected in September 2016 with consecutive numbers. Among these newborns, 93.52% (total 1097, 557 males and 540 females) were born in Shanghai and 6.48% (total 76, 43 males and 33 females) were born in Macao. All the newborns had undergone traditional biochemical NBS, including MS/MS, Phe, TSH, 17-OHP, and G6PD analysis. Written informed consent was obtained from their parents/legal guardians. This study reused the NBS samples and was approved by the Xinhua Hospital Ethics Committee Affiliated to Shanghai Jiao Tong University School of Medicine (XHEC-C-2017-021-2),
A total of 573 genes related to more than 200 birth-defect diseases were selected based on penetrance, age of onset, pathogenicity and mode of inheritance (Additional file 1).
Most of the diseases were inherited recessively and rare diseases with extremely low incidence rates were excluded.

NGS and data analysis
Target-specific primers were generated based on the hg19/GRCh37 human reference genome. The primers were designed to have similar length, GC content, and similar amplicon fragment size to maximize the amplification efficiency. The amplicon fragments included the exons and flanking intronic sequences. DNA extracted from a single DBS was generally enough for target capture and library preparation. The amplicons were sequenced on a NovaSeq 6000 instrument (Illumina, San Diego, CA, USA) according to the manufacturer's protocol. Paired-end reads were aligned to the NCBI reference sequence (hg19/GRCh37) using BWA and variant calls were made using GATK. Primary analyses of the NGS data included sequence alignment, post-processing, and variant calling.
Hyperphenylalaninemia caused by PAH mutations is a common inherited metabolic disease in China. According to ClinGen PAH expert panel specifications to the ACMG/AMP variant interpretation guidelines, the criteria for BS1 (allele frequency greater than expected for the disease) is above 0.2% in the race-matched population (10). Variants with >0.2% frequency in the population variant databases -Genome Aggregation Database (gnomAD), and 1000Genomes of East Asia database were filtered. Considering the relatively high carrier frequencies of metabolic diseases in East Asian populations and the relatively small cohort in this study, variants with allele counts of more than 3 in these 1,127 samples were also filtered. Subsequently, variants met either of the following criteria: variants classified as disease-causing mutations (DM) or probable disease-causing mutations (DM?) in the Human Gene Mutation Database (HGMD); or frameshift, stopgained, stop-lost, and splice donor/acceptor variants that were supposed to have a high impact on individuals were defined as candidate variants. For G6PD, the hotspot mutations were not required to meet the standard of allele frequency or allele count due to the high frequencies. Traditional biochemical data were compared with the information on genetic variants.

NGS assay
Among the 1,173 DBS specimens, genomic DNA extracted from 1,127 samples (551 females and 576 males) was subjected to quality control and sequencing, while 46 DBS failed to meet quality control standards. The average coverage of the target region was 99.97% and 99.2% of reads were properly paired and mapped to the reference genome.  We failed to contact the parents of ID 84066. As for ID 84123, compound SLC22A5 heterozygous mutations were verified, including c.760C>T(p.R254*) inherited from his father and c.1400C>G(p.S467C) inherited from his mother (Fig. 1). The two variants are common mutations in Chinese patients with carnitine deficiency. MS/MS repeated for ID 84123 in May 2019 showed free carnitine deficiency, which was consistent with the results of the genetic analysis, indicating a false-negative MS/MS finding at birth.

IMDs carrier frequencies
The screening population included 1,173 newborns with an average age of 3.7 days; 1,097 (93.52%) were born in Shanghai and 76 (6.48%) were born in Macao. Among the 1,173 DBS specimens, genomic DNA extracted from 1,127 (576 males and 551 females) were qualified and sequenced. This study calculated the carrier frequencies of 77 genes (Table   S2) related to amino acid metabolism, organic acid metabolism, and fatty acid oxidation.
Variants used to calculate carrier frequencies were listed (Table S3). Based on our criteria, 51 genes listed in Table S2 were identified in newborns (Table 2).

Discussion
The development of NGS has revolutionized genetic research and NGS methods have been proven to be effective for detecting genetic disorders (12). Since our previous study demonstrated that DNA extracted from DBS could be used for NGS, we designed a gene panel consisting of 77 genes related to amino acid metabolism, organic acid metabolism, and fatty acid oxidation and compared the specificity and sensitivity of the sequencing data (7). This cohort included 1,173 DBS, among which 1,127 (96.1%) qualified for NGS.
Several factors can affect the yield, including individual differences in white blood cells, in addition to DNA degradation rates due to storage of the DBS samples for over a year before retrieval for NGS. The subjects in our study were born in September 2016 and their NBS samples were continuously numbered from ID 83038 to 84210.
Comparisons of biochemical screening and genetic 9 results Four newborns (all males) had low G6PD enzyme levels with hemizygous variants classified as DM or DM? in the HGMD database. The biochemical results for G6PD were consistent with the genetic variants, indicating the high sensitivity and specificity of G6PD. G6PD deficiency is an X-linked, genetic defect that arises due to a mutation in G6PD and is the most common human enzyme defect. The most frequent clinical manifestations of G6PD deficiency are neonatal jaundice and acute hemolytic anemia, which are usually triggered by an exogenous agent. Heterozygous females generally have less severe clinical manifestations than those in G6PD-deficient males (11). The highest frequencies of G6PD deficiency occur in tropical Africa, tropical/subtropical Asia, the Mediterranean region, and the Middle East. The global distribution of G6PD deficiency is similar to that for epidemic areas of malaria, indicating that G6PD deficiency confers resistance against malaria (13,14). In this study, the four newborns with low G6PD levels included two males born in Shanghai (ID 83162, 84010) and two males born in Macao (ID 83841, 83847). In this cohort, 557 males were born in Shanghai, two of whom had G6PD deficiency. Among 43 males born in Macao, which is located in the subtropical area of Asia, two males had G6PD deficiency. These findings indicated the higher prevalence of G6PD deficiency in Macao than that in Shanghai.
Three infants had high 17-OHP levels at birth; with normal when they were called back and re-tested again in October 2016. Reexamination for false-positive findings for 17-OHP enzymatic screening at birth could be time-consuming. Thus, a second-tier test may be needed to improve the efficacy of this screening and reduce the number of false-positive findings.
In addition to the cases mentioned above, two children with negative biochemical results harbored DUOX2, and SLC22A5 variants, respectively.  (17). The newborn (ID 84066) in this study harbored the same variants and a normal TSH level (1.5mU/L; the cut-off value for TSH in our lab is 10). The infant was born at full term (38.5 weeks) with normal weight (3.095 kg); therefore, the possibility of late-onset TSH increase was low. We failed to contact the family of the newborn for further study; however, there are two possible explanations for these findings. First, the two variants may be located at one allele; thus, there would be no impact on her thyroid. Second, this observation could also have occurred due to a falsenegative biochemical result. Our study has certain limitations. The proband-only sequencing approach made it hard to distinguish whether the two variants were bi-allelic or located at one allele of the recessive gene, as we did not collect the blood samples of the parents, unless the two variants were not far and could be distinguished using the integrative genomics viewer.
Homozygous or compound heterozygous mutations in SLC22A5 can cause systemic primary carnitine deficiency (SPCD) and lead to impaired fatty acid oxidation in muscles (18). This study identified one case (ID 84123) with compound heterozygous SLC22A5 variants. His C0 (free carnitine) value was 11.6μmol/L (reference: 10-60μmol/L) at birth in September 2016. When he was called back in April 2019 due to SLC22A5 mutations, his C0 value was 4.3μmol/L, which was lower than the normal range and could be classified as SPCD. SPCD encompasses a broad clinical spectrum including metabolic decompensation in infancy, childhood myopathy involving heart and skeletal muscle, pregnancy-related decreased stamina or exacerbation of cardiac arrhythmia, fatigability in adulthood, or absence of symptoms (19). Genomic screening can avoid diagnostic odyssey and allow optimal early treatment strategies in affected newborns. The child is nearly 3 years old, and has not yet shown any symptoms. This child should receive regular physical examination and possible treatment to maintain his plasma carnitine levels and prevent primary manifestations.

Carrier frequencies
Our team focused on IMDs, and we determined the carrier frequencies of 77 genes related to disorders detectable by MS/MS, including amino acid metabolism, organic acid metabolism, and fatty acid oxidation, which are listed in Table S2. A total of 51 genes were identified in 1,127 subjects, the other 26 genes with DM or DM? variants were not detected according to our criteria. The top five genes with the highest carrier frequencies in these newborns were PAH (1.77%), ETFDH (1.24%), MMACHC (1.15%), SLC25A13 (0.98%), and GCDH (0.80%). According to previous studies in our lab, the prevalence of HPA was 1:11,763 and the prevalence of methylmalonic acidemia and hyperhomocysteinemia, cblC type was 1:28,000 (20,21). The carrier frequencies of PAH and MMACHC were consistent with the prevalence, indicating the rationality of our data.
To the best of our knowledge, this is the first study to report the carrier frequencies of these many genes in China.
In 2018, a pilot study of expanded carrier screening in China investigated 11 recessive diseases, including PAH deficiency, reporting a PAH carrier frequency of 3.59% among all ethnicities in China. The results varied among different ethnicities; the PAH frequency for Han ethnicity was 3.30% (8), which was higher than that in our study (1.77%). The subjects in our study were mainly born in Shanghai and the population was comprised mostly of individuals of the Han ethnicity. One explanation for the discordance in carrier frequencies between the two studies may lie in the difference in the criteria used to determine variants.
Another study in 2016 applied a molecular approach for NBS for four genetic diseases in Guizhou Province of southern China, including beta-thalassemia, G6PD deficiency, PKU, and non-syndromic hearing loss. The study included 515 newborns and selected 10 common mutations of PAH in the Chinese population, and reported a PAH carrier frequency of 0.78% (22). This frequency was much lower than that in our study (1.77%). Our data indicated 20 newborns carrying variants with mutations in 17 different positions, and all of which were classified as DM or DM? This comparison also suggests that PAH is better screened as whole exons rather than as variants based on mutations in several positions.

Conclusions
Our study provides data that combining biochemical results and genetic variants in newborns, which could be helpful for the evaluation of other expanded NBS strategies.We

Consent for publication
The participants signed the letter of consent in the study and they also gave their authorization for the publication of the results.

Availability of data and materials
The data generated or analyzed during this study are included in this article and its supplementary files.

Competing interests
The authors declare that they have no conflict ofinterest.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download. Table S1.xlsx Table S2.docx Table S3.xlsx