Thalassemia is a common genetic disease causing significant public health problems and social burdens in endemic areas (21, 22). In recent years, the incidence of thalassemia has gradually decreased with the improvement and widespread popularization of genetic counseling and prenatal diagnosis (PND) technologies (23, 24). However, a high prevalence of thalassemia has still been reported in southern China due to the lack of PND and genetic counseling (9, 18). The overall prevalence of α-thalassemia, β-thalassemia, and α + β-thalassemia in this study was 7.880%, 2.210%, and 0.480%, respectively (25). Ganzhou is the southernmost city of Jiangxi Province, central China, and it is adjacent to Guangdong Province, which had one of the highest incidence rates of thalassemia in China (26). Therefore, investigating the genotype and distribution of thalassemia in Ganzhou city is of great significance for providing a theoretical basis for PND and genetic counseling.
In this study, NGS was applied for large-scale population screening to assess the frequency of thalassemia carriers among people in the Gannan region. The results demonstrated the great heterogeneity and widespread spectrum of thalassemia in the Gannan population. The overall frequency of thalassemia was 14.545%, which was significantly higher than that (10.570%) nationwide (25). Furthermore, the incidence of α-thalassemia (10.489%) was significantly higher than that of β-thalassemia (3.610%) in the Gannan region (p < 0.05), which was in accordance with previous studies (14). These results indicate that thalassemia is a serious public health problem in the Gannan region. It is interesting to note that the prevalence rate of thalassemia decreased from the south to the north in this province. The region with the highest prevalence was Dingnan (18.317%), followed by Xunwu (17.723%). One reason for this trend may have been that Dingnan and Xunwu are situated in southeastern Jiangxi Province at the junction of Fujian, Guangdong, and Jiangxi Provinces. The vast majority of the residents in Dingnan are Hakka people, who have been previously reported to have a high prevalence of thalassemia (27, 28). More importantly, with the application of NGS to a large population, our data more accurately reflect the prevalence of thalassemia and the distribution of rare thalassemia genotypes in the Gannan region.
The detection rate (10.489%) for α-thalassemia in this study was significantly higher than that previously reported (7.190%) in the Gannan region or (2.600%) in Jiangxi (14, 29). We attribute the differences to different genetic screening methods, and sample sizes between these two studies. In addition, we identified 40 distinct α-thalassemia genotypes with 21 different variations, in which, αα/--SEA was the most common subtype, with a remarkable proportion of 54.105%, followed by -3.7/αα (28.011%) and -α4.2/αα (8.687%), which was consistent with previous reports (14, 30). Apart from these common variation types, other variations with rare or novel mutations were also identified. Hb Phnom Penh (HBA1: p. Phe118_Thr119insIle), a rare variant caused by the insertion of an ATC (for isoleucine) between codons 117 and 118, was identified as a hotspot for nucleotide insertions within exon 3 of the α1-globin gene. It was first reported in the Cambodian population (31) but has been rarely reported in the mainland of China or Taiwan province (32, 33). --THAI (NC_000016.9: g.199800_233300del), which has been reported in southern China except for Jiangxi Province was also detected in this study (34–36). Furthermore, we also detected other rare genotypes that have not been reported in Jiangxi Province, including Alpha2 Codon 30 del GAG, Initiation codon (-T), and αfusion. These novel findings greatly enrich the database of known thalassemia alleles in the Gannan region.
A total of 35 β-thalassemia variations with 42 genotypes that have not been reported in our previous study using RDB gene chips were identified in this cohort, our results suggested that NGS was preferable to RDB gene chip for the screening of rare variants (14). The prevalence of β-thalassemia (3.36%) in this study was much higher than the reported average of 2.21% in China (25). In addition to conventional β-thalassemia mutants, rare deletion variants, including Chinese Gγ+(Aγδβ)0, SEA-HPFH, and Taiwanese deletion, were also detected. Regarding β-thalassemia genotypes, IVS-II-654 (C > T)/βN and codons 41/42 (-TTCT)/βN were the two most frequently detected β-thalassemia subtypes, accounting for 35.257% and 28.368%, respectively. The ranking order of the two major mutations also was IVS-II-654 (C > T) and Codons 41/42 (-TTCT), which agreed with our previous observations (14). It was interesting to note that these results were identical to those of the Hakka population in Meizhou, Guangdong Province (27, 30), and these results implied that the prevalence of β-thalassemia and its genotype distribution were geographically associated. In addition to the higher detection rate, our study also detected some rare β-thalassemia mutations that had not been reported previously, such as -50 (G > A) and 5'UTR + 43 to + 40 (-AAAC), which accounted for 11.339% of all β-thalassemia genotypes.
Unexpectedly, two mutant homozygotes (-28 (A > G)/-28 (A > G)) and nine compound heterozygotes were identified in this study, and the hematological parameters of the affected individuals were typical of thalassemia (microcytic hypochromic anemia). Moreover, compound heterozygotes included common mutations (-28 (A > G), codons 41/42 (-TTCT), IVS-II-654 (C > T), codon 17 (A > T)), and compound rare mutations (Chinese Gγ+(Aγδβ)0, -50 (G > A), 5'UTR + 43 to + 40 (-AAAC), CAP + 8 (C > T)). Undoubtedly, the application of conventional thalassemia genetic testing methods will not be able to accurately determine the genotypes of these populations.
With the development of NGS techniques in recent years, NGS has emerged as a powerful and cheaper tool for prenatal screening (18, 37). To date, several studies have applied NGS for the study of thalassemia and have made great progress (38, 39). In our study, NGS was applied for thalassemia screening at a cost of $10 per sample. In total, 56 thalassemia mutations, including 48 rare mutations were identified, among which, only 23 mutations would have been detected using traditional detection methods, such as RDB and gap-PCR (20), with the remaining 33 mutations being missed. In other words, 4.010% (795/19,827) of the population will be missed or misdiagnosed using traditional screening methods. Traditionally, RBC analysis combined with hemoglobin electrophoresis and clinical manifestation description is commonly used for preliminary screening of thalassemia. Then PCR or genome sequencing is used to confirm positive cases before diagnosis (1). Limited by the low sensitivity of hematological analysis and the disadvantages of PCR, a large number of novel or rare thalassemia variations would be missed or misdiagnosed using traditional screening methods. To fill this gap, our findings suggest that NGS can effectively identify new mutations and reduce the rate of misdiagnosis.
In summary, our study was the first to apply NGS to comprehensively analyze thalassemia in a large population of the Gannan region, Jiangxi Province. We demonstrated a high genetic diversity and a high prevalence of thalassemia in this region, which will be of great significance for the prevention and control of thalassemia in Gannan and other high-prevalence areas. More importantly, the identification of rare and novel variations demonstrated the necessity and importance of choosing NGS for thalassemia screening in big populations.