Prevalence of Thalassemia-carrier of Couples at Childbearing Age and Risk Prediction of Thalassemia

Background: Thalassemia is highly prevalent hematologic disease in Guizhou, China. This study aims to determine the epidemiological characteristics of thalassemia for couples at childbearing age in this subpopulation. Results: There were 4481 couples at childbearing age recruited for thalassemia-carrier screening through both traditional hematological tests and next-generation sequencing (NGS). Of them, 1314 (14.66%) thalassemia-carriers were identied, including 857 (9.76%) α-thalassemia, 391 (4.36%) β-thalassemia, and 48 (0.54%) composite α and β-thalassemia. Of them, 38 couples were high-risk thalassemia carriers. In addition, 12 a-globin gene alterations and 16 b-globin mutations were detected including four novel thalassemia mutations. SEA is the most common α-thalassemia genotype (26.86%), CD41-42 is the most prevalent β-thalassemia genotype (36.57%); the αα/-α 3.7 + CD41-42 is the most frequent composite α and β-thalassemia genotype (18.75%). Ethnically, the Zhuang has the highest rate of thalassemia-gene carriers among the ethnic groups. Geographically, Qiannan presented the highest rate of thalassemia-gene carrier. Conclusion: This result enriched the genetic map of thalassemia and provided thalassemia genetic counseling and fertility-guidance for thalassemia-carriers in Guizhou, China. The NGS is so far the most accurate method for population thalassemia screening.


Background
Thalassemia is the most prevalent monogenic hematologic disorder that affects millions of people and leads to thousands of deaths around the world every year. Approximately 5% of the population worldwide is thalassemia carriers [1,2]. Currently, a cure or treatment for children with thalassemia is not always available or affordable. For severe thalassemia patients, hematopoietic stem cell transplantation is the only cure, but is a costly treatment that imposes a heavy burden on families and the society. Therefore, prevention of thalassemia through conducting genetic screening and counseling to childbearing-age couples in the regions with high-incidence of thalassemia is an effective measure.
Thalassemia is highly epidemic in southern provinces of China, and it has been a serious public health problem in those regions. Thus, prevention of thalassemia has become a strategic need to reduce birth defects in southern China [3]. Guizhou, as one of the high-incidence province of thalassemia, was populated by multiple ethnic groups, and genetic screening and counseling is usually not available due to economic di culties. Thus, thalassemia becomes more and more prevalent in this subpopulation.
Next-generation sequencing has been shown to allow rapid, multiplex and high-throughput detection of genetic variants [4]. NGS related technologies-applied to the whole genome, the exome, or targeted gene panels-have been effectively used in research settings, as well as in clinical testing and diagnosis of genetic disorders [5,6]. In this study, we adopted NGS to determine the mutation types, prevalence, and distribution of thalassemia in 4481 couples at childbearing age from various ethnic groups inhabited in Guizhou, and then to assess the risk of having a child with thalassemia for each couple.

Participants
A cohort of 4481 couples (8962 subjects), at age between 19-45 years, were recruited by simple random sampling in Guizhou Province, China. These participants came from multiple ethnicities inhabiting in 9 regions across Guizhou province ( Figure 1). The subjects were divided into four age groups: 19-25 years old, 1792; 26-30 years old, 3136; 31-35 years old, 3402; 36-45 years old, 632. All participants have signed the informed consent, and all experimental protocols in this study were approved by the Medical Ethics Committee at A liated Hospital of Zunyi Medical University in accordance with the Declaration of Helsinki. Approximately 10 ml peripheral blood was drawn with EDTA anticoagulation from each subject, and stored at 4°C until further use.
Traditional screening of thalassemia carriers using hematological phenotype analysis All samples were rst screened using traditional hematological methods, included routine blood examinations and hemoglobin electrophoresis. Routine blood examinations were performed using Automated Hematology Analyzer XE5000 (Sysmex, Japan). Hematological positivity for thalassemia is de ned when the average red blood cell volume (MCV) < 80 or average red blood cell hemoglobin content (MCHC) < 27pg. Hemoglobin electrophoresis was carried out using Automatic Hydrasys Capillary electrophoresis system (Sebia, French). Subjects are considered as thalassemia carriers when their hemoglobin HbF >1.2% or HbA2 > 3.5% or HbA2 < 2.5%. According to the International Cutout Association recommended cutoff value (MCV < 80fL, MCH < 27pg) for thalassemia diagnosis, subjects are identi ed as possible sufferers of β-thalassemia or composite α-and β thalassemia [16].
Genomic DNA extraction and Genotypic Analysis Using Traditional Methods Blood genomic DNA was extracted with the MagPure Buffy Coat DNA Midi KF Kit (Magen, China) and the GenMag Nucleic Acid Isolation kit (GenMagBio, Beijing,China). Subjects positive for thalassemia by traditional hematological tests were further subjected to genotyping with Gap-PCR and multiplex ligation-dependent probe ampli cation (MLPA) for CNVs, reverse dot blot (RDB), high-resolution melting analysis (HRMA) for SNVs, and the Sanger sequencing. All techniques were performed following the routine protocols.

NGS Genotyping
To compare and determine whether NGS genotyping is superior to traditional hematological tests plus Gap-PCR genotyping, we analyzed all the samples with NGS. NGS combined with Gap-PCR was used to screen thalassemia gene mutations, and the speci c experimental methods were the same as those used in our previous studies [3,8].

Data Analysis and Interpretation
The original sequencing data processing and gene mutation description were implemented with references. All statistical analyses were performed using SPSS 21.0 software. The counting data was illustrated as number (n) and percentage (%). The χ2 test was used for determination of sample difference, and P-value 0.05 was considered to be statistically signi cant.
Other related analyses have been described in our previous report [17] (Fig. 1).

Results
Thalassemia carriers identi ed by hematological examinations and traditional DNA sequencing Among 8,962 participants, 1961 subjects were initially identi ed as thalassemia carriers by hematological examinations, and the detection rate was as much as 21.88%. However, only 988 subjects were con rmed to be thalassemia by traditional DNA sequencing, including 658 α-thalassemia carriers, 294 β-thalassemia carriers, and 36 composite α-and β-thalassemia carriers. The general detection rate is 11.02% (988/8962).

Geographic Distribution of thalassemia gene carriers in Guizhou
Among the 9 regions in Guizhou province, Qiannan was the region with highest carrier rate of α-thalassemia ( Fig. 3).

Identi cation of high-risk couples with thalassemia by NGS
In this study, we identi ed 0.85% (38/4881) of the couples with high risk for thalassemia using the NGS, and only 0.36% (16/4481) of the couples was found to be high-risk carriers for thalassemia by routine techniques (Fig. 3, Table   4). Thus, compared with the traditional screening/detection methods, there are 22 (23.2%) more high-risk couples for thalassemia that were identi ed by NGS. Among the 38 couples, 10 couples carried --SEA /αα genotype and are high-risk for Hb Bart's edema, 11 couples were at high risk for the H disease, and the other 16 couples are carriers of heterozygous mutations. Interestingly, about half of these couples were from the same ethnic group and live in an isolated area of Guizhou Province, suggesting founder variations may exist in this particular subpopulation.
Higher detection rate of thalassemia using NGS than traditional techniques When traditional techniques including hematological tests and Sanger DNA sequencing were adopted, only 988 subjects were con rmed to be thalassemia carriers, the detection rate was 11.02% (988/8962). In contrast, 1314 subjects were identi ed to be thalassemia carriers by high-throughput NGS, and the detection rate was 14.46% (1314/8962). There are 326 cases or 3.64% (326/8962) of thalassemia carriers missed by routine detection techniques. Of the 326 cases, 74 were undetectable by conventional thalassemia detection techniques (Table 5), and another 252 were missed due to the lower sensitivity of the hematological examination or the defects of conventional thalassemia gene detection technology. In addition, 38 couples (0.85%, 38/4481) with high risk for thalassemia were identi ed by NGS while only 16 couples (0.36%, 16/4481) were detected by traditional methods.

Discussion
Thalassemia is a serious lethal and disabling monogenic hereditary hematological disease that is common in particular areas around the world. Implementation of genetic testing and counseling to the couples at childbearing age in those regions will play a central role for preventing or reducing the birth of thalassemia sufferers [9,10]. Guizhou province, located in the plateau mountainous area in southwestern China, is a multi-ethnic province with a high incidence of thalassemia. Thus, in this study, we determined the prevalence of young thalassemia carriers in Guizhou by recruiting 4881 couples at childbearing age for thalassemia screening using both traditional methods and NGS.
Our results indicated that the overall frequency of thalassemia gene carriers was as high as 14.66%. Geographically, of the 9 regions in Guizhou Province, Qiannan displayed the highest carrier rate of thalassemia, followed by Qianxinan, Qiandongnan, and Anshun. The higher carrier rate of thalassemia genes may be related to the poor economic condition, geographical environment, and ethnicity in those regions. In addition, ethnic groups including Buyi, Yi, Miao, and Zhuang inhabit those regions with consanguineous marriage, which lead to higher prevalence of thalassemia due to founder genetic alterations. In contrast, the majority Han showed the lowest carrier rate of thalassemia genes (8.31%) in those regions.
To date, there are 17 common types of a-globin gene alterations identi ed in the Chinese population, and the static αthalassemia (-α 3.7 /, -α 4.2 /, α CS α/) represents 2.3% [11] . In this study, we have detected 12 of the 17 types of the a-globin gene mutations. Of them, --SEA /αα, -α 3.7 /αα, and -α 4.2 /αα are the most common types. The genotype pattern of the αthalassemia in Guizhou is similar to that in other Southern provinces such as Fujian, Guangdong, and Guangxi [12,13].
In addition, since the frequency of the SEA-de ciency type (--SEA ) is as high as 26.86%, there is a higher possibility that both couples carry the --SEA [14]. Actually, among the 38 couples of high-risk for thalassemia identi ed in this study, 10 couples carried the --SEA /αα mutation that could have given birth to a child with Hb Bart's edema, and unfortunately, three couples gave birth to three children with thalassemia before the results were delivered to them. One of the children died, and the other two are under regular blood transfusion therapy. The other 28 couples of high-risk for thalassemia, including 7 couples for H disease, and 8 couples with heterozygous mutations, have regular followed ups and participate in our fertility-guidance. Thus, to prevent the birth of severe thalassemia children, it is very necessary to conduct genetic counseling and fertility guidance combined with prenatal diagnosis and follow-up for high-risk couples.
In addition, if one of the couple carries the composite α-and β-thalassemia and the other is not the --SEA /αα carrier, then no intermediate or severe thalassemia children will be born. In this study, we identi ed 48 cases with composite α-and β-thalassemia, but only 4 cases of their spouses carried the --SEA /αα gene. These 4 affected couples are currently under fertility-guidance.
Thalassemia is di cult to diagnose at the early stages of life. In this study, we demonstrated that NGS is superior to traditional hematological tests. A total of 1314 thalassemia carriers were detected with NGS, while only 988 carriers of thalassemia were con rmed when traditional methods were used for screening. There were 326 (3.64%) thalassemia carriers missed with traditional methods. Of the 326 subjects, 252 cases were missed due to the negative hematological test results, 74 cases were undetectable through Gap-PCR/PCR-RDB-Sanger sequencing. Traditional hematological tests usually presented false negative/positive results due to confusion with some small cell hypopigmentemia such as hemoglobin disease, iron de ciency anemia, etc. [15], while Gap-PCR or PCR-RDB + Sanger DNA sequencing apparently miss unknown site mutations and many large Ins/Del alterations [2]. Thus, the highthroughput NGS is currently the most accurate method for the detection of thalassaemia genes.
In summary, we identi ed 1314 (14.66%) thalassemia-carriers from 4481 couples (8962 subjects) at childbearing age in Guizhou Province, and determined their geographical and ethnic distribution/pattern and genetic map of thalassemia-gene clusters in this subpopulation. Of the carriers, 38 couples are identi ed as high-risk thalassemiacarriers, and currently under follow-up and fertility-guidance. In addition, we demonstrated that NGS is more accurate than the traditional methods for thalassemia-carrier screening. These results enriched the genetic map of thalassemia in China and provide a theoretical basis for formulating prevention and management of thalassemia sufferers.

Declarations Acknowledgements
We thank all the participants for their cooperation and contribution. We are grateful to The BGI-Shenzhen for performing and analyzing the NGS sequencing.

Ethics approval and consent to participate
This study was approved by the Research Ethics Committee at Zunyi Medical University. Written informed consent was obtained from all participants. For participants under 16 years old, written informed consent was obtained from a parent or guardian.

Consent for publication
Written informed consent for publication of identifying images or other personal or clinical details was obtained from all of the participants included in the study. For participants under 18 years old, written informed consent was obtained from a parent or legal guardian.

Availability of data and materials
All data and materials are included in this article.

Competing interests
The authors have no declaration of con icts of interest. All the experiments undertaken in this study comply with the current laws of China, where the research was performed.  Note a-thal, cases of a-thalassemia; F: genotype frequency; b-thal, cases of b-thalassemia; a-& β-thal chr, α and βthalassemia chromosome; CR: constituent ratio; AN: allele number; P: percentage; Note α-thal, cases of α-thalassemia carrier; F: genotype frequency; β-thal, cases of β-thalassemia carrier; α-& β-thal chr, α and β-thalassemia chromosomes; CR: Constituent ratio; P: percentage;