Allele-specific PCR and next generation sequencing based genetic screening for Congenital Adrenal Hyperplasia in India.

Genetic screening of Congenital Adrenal Hyperplasia (CAH) is known to be challenging due to the complexities in CYP21A2 genotyping and has not been the first-tier diagnostic tool in routine clinical practice. Also, with the advent of massive parallel sequencing technology, there is a need for investigating its utility in screening extended panel of genes implicated in CAH. In this study, we have established and utilized an Allele-Specific Polymerase Chain Reaction (ASPCR) based approach for screening eight common mutations in CYP21A2 gene followed by targeted Next-Generation Sequencing (NGS) of CYP21A2, CYP11B1, CYP17A1, POR, and CYP19A1 genes in 72 clinically diagnosed CAH subjects from India. Through these investigations, 88.7% of the subjects with 21 hydroxylase deficiency were positive for eight CYP21A2 mutations with ASPCR. The targeted NGS assay was sensitive to pick up all the variants identified by ASPCR. Additionally, six study subjects were homozygous positive for other CYP21A2 variants: one with a novel c.1274G > T, three with c.1451G > C and one with c.143A > G variant. One subject was compound heterozygous for c.955C > T and c.1042G > A variants identified using ASPCR and NGS. One subject suspected for a Simple Virilizing (SV) 21 hydroxylase deficiency was positive for a CYP19A1:c.1142A > T variant. CYP11B1 variants (c.1201-1G > A, c.1200+1delG, c.412C > T, c.1024C > T, c.1012dup, c.623G > A) were identified in all six subjects suspected for 11 beta-hydroxylase deficiency. The overall mutation positivity was 97.2%. Our results suggest that ASPCR followed by targeted NGS is a cost-effective and comprehensive strategy for screening CYP21A2 mutations and the CAH panel of genes in a clinical setting.


Introduction
Congenital Adrenal Hyperplasia (CAH) includes a heterogeneous group of autosomal recessive disorders resulting from molecular defects in any one of the enzymes involved in adrenal steroidogenesis. De ciency of 21-hydroxylase, an enzyme that is crucial for the synthesis of aldosterone and cortisol, accounts for over 90% of patients with CAH [1]. The unequal crossing over and gene conversion events between the functional CYP21A2 gene and non-functional CYP21A1P pseudogene contribute to 95% of mutations in 21-OH de ciency [2].
The general incidence of classical CAH in Caucasians is around 1 in 15000, whereas in India, the cumulative prevalence is as high as 1 in 5762 [3] [4].
Biochemical investigations in CAH based on 17 hydroxyprogesterone (17OHP) measurements have shown to yield false-positive results due to assay interference with other steroid intermediates [5]. Further, these enzymatic assays cannot explain disease severity or distinguish heterozygotes. Despite these drawbacks in biochemical testing, genetic screening in 21-0H CAH is still not used as a rst-tier diagnostic tool [6] due to pseudogene imposed complexities in genotyping the CYP21A2 gene. However, with a high prevalence and carrier frequency, there is a need for cost-effective, sensitive, and speci c genetic screening strategies to con rm CAH diagnosis, understand the phenotypic severity, rule out carrier state and provide genetic counseling.
The other enzyme defects in steroidogenesis include 11 hydroxylase (CYP11A1, CYP11B1), 3β-hydroxy steroid dehydrogenase (HSD3B2), 17α hydroxylase (CYP17A1), and cytochrome P450 oxidoreductase (POR). Though these gene defects contribute to relatively rare forms of CAH, there is a need for molecular investigations to analyze the mutation spectrum in these genes. Also, with the advent of NGS-based screening, the clinical utility of these strategies in CAH needs to be evaluated. DNA extraction from whole blood was carried out using the QIAGEN ® kit (Hilden, Germany). Long-range PCR ampli cation of the CYP21A2 (6.2 kbp) and CYP21A1P (6.1 kbp) genes was carried out as per the published protocols [7]. Speci city of the ampli cation was further validated by restriction digestion with TaqI enzyme at 65ºC for two hours [7].

Allele-Speci c PCR for hotspot screening
The CYP21A2 long-range PCR product was used as a template for novel in-house designed allele-speci c PCR (ASPCR) primers to genotype eight known hotspot mutation in 21-OH de ciency -P30L, I2G, 8BPdel, I172N, E6CLUS (I235N, V236E, M238K) V281L, Q318X, and R356W. The assay was standardized using different primer sets for the wild type and mutant alleles. The ASPCR was carried out with EmeraldAmp ® Max PCR master mix (Takara Bio Inc, Japan) ( Table 1). All the samples were genotyped along with appropriate positive and negative controls, and the results were validated with both Sanger and Next Generation Sequencing (Fig 1).
Next-Generation Sequencing strategy A multiplex PCR was utilized for target enrichment followed by NGS for ve genes which include CYP21A2, CYP11B1, CYP17A1, and POR along with CYP19A1 gene that causes aromatase de ciency mimicking CAH. The PCR assay was standardized with Qiagen ® Multiplex PCR mix to amplify the coding and splice site regions of the above genes with the in-house designed primers. These PCR products were sheered separately and pooled along with the long-range PCR product of the CYP21A2 gene. Library preparation and NGS using Ion Torrent PGM TM was carried out as per previously published protocols [8]. Data analysis was carried out using the Ion torrent suit software (Version 5.0.4.0) and DNA Star software v13. The classi cation of the identi ed variants was based on ACMG 2015 guidelines [9]. Varsome and other widely available online tools were used for data interpretation [10,11]. All the clinically relevant variants were validated using Sanger sequencing.
Multiplex Ligation-dependent Probe Ampli cation (MLPA) MLPA was carried out for the samples suspected of large deletions and rearrangements based on long-range PCR and restriction digestion results. The assay was standardized with SALSA MLPA Probemix P050 from MRC-Holland ® [12] using reference samples as per the manufacturer's protocol. The results were analyzed using coffalyser software [13].

Results
A total of 72 subjects (49 paediatric and 23 adults) were included in the study, of whom 66 subjects were clinically suspected of having 21-OH de ciency while six were suspected for 11β -OH de ciency. Sixty-seven were from the southern part of the country, and ve were from northern India, with 32 males and 40 females.
Among the subjects with 21-OH de ciency, 60.6% (n=40) were of Salt-Wasting (SW) phenotype, 31.8% (n=21) with Simple Virilizing (SV) phenotype and 7.6% (n=5) with Non-Classical (NC) CAH. The age of diagnosis varied from 1 to 95 days in SW type, two weeks to 8 years in SV type, and 3.5 to 26 years in NC CAH.
The consanguinity proportions were 42.5%(SW), 30%(SV), and 50%(NC). The mean basal 17-OHP values were 127.9, 36.1, and 21 ng/ml in SW, SV, and NC phenotype respectively. Subjects with SW phenotype presented with mean sodium of 126.4 meq/l and mean potassium of 7.1 meq/l. A short synacthen test con rmed poor cortisol response among 17 patients with SW phenotype. In four males and two female subjects suspected of 11β -OH de ciency (5 Paediatric and 1 adult), the age of diagnosis varied from 2.5 to 14 years with a mean basal 17-OHP of 10.7 ng/ml, and 40% of the patients were born of consanguineous marriages.

Long range PCR and MLPA
The long-range PCR yielded speci c ampli cation of the functional and pseudogene in 62/66 samples with appropriate restriction digestion patterns. Subjects C15, C62, and C71 had only the functional gene ampli ed with a restriction digestion pattern of the pseudogene. On the other hand, subject C29 with only the pseudogene ampli cation gave a restriction digestion pattern of the functional gene. Based on these results, MLPA was carried out, which showed a homozygous large deletion involving 5' of CYP21A1P and 3' of CYP21A2 in subjects C15, C62, and C71. These three subjects were homozygous positive for all eight hotspot mutations in ASPCR. MLPA results also con rmed a large gene conversion involving 5' of CYP21A2 and 3' of CYP21A1P in subject C29.

Allele Speci c PCR for hotspots screening
Utilizing the in-house designed ASPCR approach, CYP21A2 hotspot mutations were identi ed in 55/62 subjects -33 SW, 19 SV, and 3 NC CAH. Out of 33 subjects with SW phenotype, 25 (75.8%) had biallelic mutations, seven (21.2%) had multiple heterozygous mutations, while one subject (C3) was positive only for a heterozygous Q318X mutation. Among the 19 subjects with SV phenotype, 10 (52.6%) had biallelic mutations, eight (42.1%) had multiple heterozygous mutations, and one subject (C33) was heterozygous for the 8BP deletion in exon 3. Out of ve non-classical subjects, two were positive for biallelic mutations (40%) and one subject (C65) had a single heterozygous V281L hotspot mutation. The remaining seven subjects were negative for ASPCR.

Next Generation Sequencing for a targeted panel of 5 genes in CAH
The NGS assay for a single gene CYP21A2 was carried out for all the samples with 21-OH de ciency -both positive and negative for ASPCR. Five gene NGS panel was utilized for those subjects negative for ASPCR and those with suspected 11β -OH de ciency. The Multiplex PCR -NGS assay covered the complete coding and splice site regions of ve genes included in the panel. The mean base coverage for ve genes was 700X and >99 % of the target had a minimum coverage of 20X. Further, with NGS, no additional samples were positive for the eight hotspot mutations corroborating the sensitivity and speci city of the ASPCR assay.
Additional homozygous CYP21A2 variants were identi ed in ve out of seven subjects who were negative for ASPCR. These variants include c.1451G>C(p.Arg484Pro) in three subjects, c.143A>G(p.Tyr48Cys) in one subject and a novel c.1274G>T(p.Gly425Val) variant in one subject Interestingly, subjects C31 and C32 with SW phenotype and subject C47 with SV phenotype were positive for homozygous CYP21A2:c.1451G>C variant. The younger sibling of subject C31 with a similar clinical presentation of SW phenotype was also homozygous positive for this recurrent variant. This variant has been previously reported in patients with both SW and SV phenotype [14,15]. Subject C3, one year old female with SW phenotype, was heterozygous positive for Q318X mutation identi ed through ASPCR and heterozygous positive for CYP21A2:c.1042G>A(p.Ala348Thr) variant identi ed through NGS. This subject was compound heterozygous for the above variants and inherited the Q318X mutation from the mother and A348T variant from the father. In subjects C33 and C65, who only had single heterozygous hotspot mutation identi ed in ASPCR, no additional CYP21A2 variants were identi ed through NGS. Subject C63 with a non-classical phenotype was negative for variants in all the ve genes screened.
Subject C46, who was initially suspected of having a 21-OH SV phenotype, was negative for mutations in the CYP21A2 gene but was found to be positive for an aromatase gene CYP19A1:c.1142T>A(p.Asp381Val) variant. Born to second-degree consanguineous parents, this subject had ambiguous genitalia and clitoromegaly at birth with an elevated 17-OHP of 15.4 ng/ml. De ciency in the aromatase enzyme due to CYP19A1 mutations has been reported to cause ambiguous genitalia in 46XX females [16]. A recent report has also shown that aromatase de ciency can mimic SV CAH [17]. The majority of the in silico tools support a pathogenic prediction, and only two heterozygous alleles have been reported in South Asians so far (MAF -0.0000653, GnomAD exomes). However, there is a need for further investigations and family screening to con rm its pathogenicity, and based on ACMG 2015 guidelines, this variant has been classi ed as a Variant of Uncertain Signi cance (VUS).
Out of six subjects with suspected 11β -OH de ciency, four had homozygous, and two had compound heterozygous variants; four novel and two reported variants were identi ed in total (CYP11B1:c.1201-1G>A, c.1200+1delG, c.412C>T, c.623G>A, c.1024C>T, and c.1012dupC). Subject C56, an eight year old female born of second degree consanguineous marriage, presenting with hyperpigmentation, clitoromegaly, and hypertension, was homozygous positive for CYP11B1:c.412C>T (p.Arg138Cys). In vitro studies of this variant had demonstrated partially impaired CYP11B1 activity [18]. Subject C53 is a 13-year-old male, presenting with hypertension and hypokalemia from 2.5 years of age and was found to be compound heterozygous for two reported missense variants CYP11B1:c.412C>T (p.Arg138Cys) and CYP11B1:c.623G>A (p.Arg208Gln). Subject C54, a 12-year-old male, with hypertension and hyperpigmentation from ve years of age, was compound heterozygous for a novel nonsense mutation CYP11B1:c.1024C>T (p.Gln342Ter) and a novel duplication CYP11B1:c.1012dupC (Gln338ProfsTer16) resulting in a frameshift at codon 338 followed by premature termination at codon 353 instead of 504.
Two unrelated subjects born to second degree consanguineous parents (C51 and C55) were homozygous positive for a novel splice variant CYP11B1:c.1201-1G>A. Subject C51, a 19-year-old male, presented with precocious puberty at two years of age, followed by recurrent hypokalemic paralysis and hypokalemic cardiomyopathy. Subject C55 had enlarged clitoris from birth and hypokalemic paralysis with hypertension at four years of age. Subject C52, presenting with hypertension, true puberty, and aggressive behavior, was positive for a novel homozygous splice variant CYP11B1:c.1200+1delG. The in silico analysis for these splice variants predict aberrant splicing and requires functional assays to con rm its pathogenicity.
No mutations were identi ed in the CYP17A1 and POR genes. The details of the individual variants identi ed through NGS assay are mentioned in Table 2a and 2b, and the complete work ow with the results is depicted in Fig 2. Parental Screening Family screening was carried out using Sanger sequencing and the carrier status was con rmed in 18 out of 21 available paediatric parental samples. De novo homozygous mutations were identi ed in three probands as their parents were negative for those mutations. Family screening is incomplete in the other study subjects due to the unavailability of one or both parental samples.

Genotype-phenotype correlation
Majority of the subjects with SW phenotype in this study had null or group A mutations which are known to result in <1% of the enzymatic activity, and I2G hotspot was the predominant genotype identi ed. The SV phenotype had most of its genotype falling in group B (1-2 % enzyme activity), with the predominant genotype being I172N. Two out of ve NCCAH subjects were positive for I2G mutation in the homozygous state under Group A. All the subjects with null mutations presented with SW phenotype. 9% of the classical subjects were positive for P30L, which usually is predicted to result in NCCAH. I2G, a SW genotype was identi ed in 28% of the subjects with SV phenotype. I172N, usually reported in SV phenotype was seen in 16 % of the subjects with SW phenotype. Eleven subjects had multiple homozygous and heterozygous mutations, indicating smaller gene conversions involving multiple exons that are frequently observed in 21-OH de ciency [19]. The frequency of the mutated alleles and their associated phenotype is depicted in Fig 3. Table 3 explains the different groups of genotypes identi ed and their correlation with clinical phenotypes.

Discussion
The genes implicated in CAH are well recognized for over decades, but genotyping in 21-OH de ciency is not routinely used as a rst-line diagnostic tool. The underlying reasons include challenges in primer speci city to avoid pseudogene interference, assay standardization, cost-effectiveness, and analysis of complex rearrangements. In the present study, allele-speci c PCR and NGS-based strategies have been established and utilized to study the spectrum of mutations in 72 Indian subjects.
ASPCR is highly preferred for genotyping as it is much simpler, cost-effective and does not require radioactive probes or restriction enzymes in comparison to southern blotting and RFLP. Our initial efforts to replicate published ASPCR protocols did not achieve expected speci city and sensitivity. Therefore, novel primer sets for the CYP21A2 hotspot mutations were designed in-house, and standardization was performed using speci c template concentrations, primer dilutions, and annealing temperatures. Following standardization with appropriate controls, the study subjects were genotyped, and the results were validated with NGS and Sanger sequencing. Through these efforts, we have established an ASPCR assay that is 100 % sensitive and speci c, with a diagnostic yield of 88.7% (55/62). These results suggest that ASPCR is the most inexpensive tool for mass screening of the hotspot mutations in 21-OH de ciency.
Majority of the earlier reports have utilized a combination of techniques to detect pseudogene-derived point mutations, large rearrangements, and Sanger sequencing was employed to rule out other mutations in the CYP21A2 gene [19]. One of the studies from India had reported mutations in 96.4% of the alleles, including four novel variants utilizing RFLP, SSCP, and nested PCR [20]. In other studies, exclusively screening the hotspot mutations, Asanuma [24] positivity. The varying yield under these studies could be due to the difference in the sample size and the number of hotspot mutations screened. Table 4 gives a comparison of these study results with previous reports on 21OH de ciency from India.
In line with previous reports, we identi ed the I2G splice variant as the single most common mutation in the overall study population and in the SW phenotype.
Studies from the USA, UK, Cuba, Vietnam, China, Sweden have reported the I2G splice variant as the most common pseudogene-derived mutation in their study cohorts [25][26][27][28][29][30]. In a mixed population study, New Maria I. et al reported I2G and V281Lmutations to be the prominent mutations identi ed in the largest cohort of 1507 subjects [31]. Even though 30kb deletion is one of the most common mutation observed in Classical 21-OH CAH [32], we have identi ed this deletion in only three subjects with salt-wasting CAH.
In general, the genotype-phenotype correlation is strong in SW and NCCAH, as reported previously [15,31,33]. In our cohort, the correlation between null and group A genotypes with their corresponding SW phenotype was high. The concordance was poor among NCCAH subjects. The discrepancies in genotypephenotype concordance of P30L, I2G, and I172N mutations could be attributed to the phenotypic heterogeneity that is often reported in CAH [34] [35].
To further screen those subjects negative for the hotspot mutations and other forms of CAH, a ve gene panel for targeted Next-generation sequencing was established. The NGS-based assay achieved complete coverage of coding and splice site regions across the targeted panel of genes and picked up all the hotspot mutations identi ed earlier with ASPCR. However, the NGS data analysis for CYP21A2 gene is challenging and requires speci c bed les (de ning the target region) along with the need to allow increased mismatches in case of suspected rearrangements to achieve alignment of the majority of NGS reads.
Interestingly, utilizing NGS based strategy, we have also identi ed a recurrent CYP21A2 mutation c.1451G>C in three (4.5%) different families. Further, studies in large cohorts are required to con rm this nding to include this recurrent mutation in ASPCR based screening in India. 2 subjects (3%) carried a single hotspot mutation only on single allele. Nonetheless, there are earlier reports in CAH patients with only one affected allele [36,37]. One subject with nonclassical phenotype (1.5%) was negative for all the ve genes screened requiring additional investigations to con rm the genetic diagnosis. [39] have utilized NGS-based screening for the CYP21A2 gene and identi ed variants in 82.6% and 79.5% of CAH subjects. In this study, we have screened an extended NGS panel of genes in CAH, which is bene cial in patients with overlapping phenotypes of milder forms of CAH. One of the subjects in this cohort suspected of having SV-21 hydroxylase de ciency was positive for a CYP19A1 variant. Although NGS-based multigene testing provides a robust strategy, it is still expensive and requires complex computational infrastructure and expertise for its clinical utility in a developing country like India. Therefore, we recommend ASPCR as a rst-tier screening tool for the CYP21A2 hotspots, followed by targeted NGS only for those subjects who are negative for ASPCR and with other forms of CAH.

Study Limitations
Due to the small number of subjects under the category of rare forms of CAH, the NGS assay for CYP11B1, CYP17A1, POR and CYP19A1 genes requires further validation. The NGS assay needs the inclusion of other genes (HSD3B2, StAR, and CYP11A1) implicated in CAH and requires future studies in large cohorts to investigate these rarer forms of CAH.

Conclusion
To conclude, ASPCR followed by a multigene targeted NGS assay has shown to be a robust strategy for comprehensive and cost-effective genetic screening of CAH in India. With this strategy, the genotype of 140 out of 144 (97.2%) alleles in a total of 72 subjects was characterized. These protocols could pave a way forward for its utility as rst-tier testing in a clinical setting for CAH diagnosis, carrier screening, newborn screening, and prenatal testing.

Declarations
Acknowledgements: Nil Funding: The study was supported by independent FLUID research grants and Molecular Endocrinology Laboratory funds from Christian Medical College, Vellore, India.
Data availability: The data that support the ndings of this study are available from the corresponding author upon reasonable request. Informed Consent: Informed consent for genetic testing from all the adult subjects and assent from the parents of all the paediatric subjects were obtained.       Frequency distribution of mutated alleles with CYP21A2 hotspots