The p.Pro2232Leu variant in the ChEL domain of thyroglobulin gene causes intracellular transport disorder and congenital hypothyroidism

Thyroglobulin (TG), the predominant glycoprotein of the thyroid gland, functions as matrix protein in thyroid hormonegenesis. TG deficiency results in thyroid dyshormonogenesis. These variants produce a heterogeneous spectrum of congenital goitre, with an autosomal recessive mode of inheritance. The purpose of this study was to identify and functionally characterize new variants in the TG gene in order to increase the understanding of the molecular mechanisms responsible for thyroid dyshormonogenesis. A total of four patients from two non-consanguineous families with marked alteration of TG synthesis were studied. The two families were previously analysed in our laboratory, only one deleterious allele, in each one, was detected after sequencing the TG gene (c.2359 C > T [p.Arg787*], c.5560 G > T [p.Glu1854*]). These findings were confirmed in the present studies by Next-Generation Sequencing. The single nucleotide coding variants of the TG gene were then analyzed to predict the possible variant causing the disease. The p.Pro2232Leu (c.6695 C > T), identified in both families, showing a low frequency population in gnomAD v2.1.1 database and protein homology, amino acid prediction, and 3D modeling analysis predict a potential pathogenic effect of this variant. We also transiently express p.Pro2232Leu in a full-length rat TG cDNA clone and confirmed that this point variant was sufficient to cause intracellular retention of mutant TG in HEK293T cells. Consequently, each family carried a compound heterozygous for p.Arg787*/p.Pro2232Leu or p.Glu1854*/p.Pro2232Leu variants. In conclusion, our results confirm the pathophysiological importance of altered TG folding as a consequence of missense variants located in the ChEL domain of TG.


Introduction
Congenital hypothyroidism (CH) is a deficiency in the formation of thyroid hormones (TH) by the thyroid gland due to environmental or genetic causes, characterized by high levels of TSH and low levels of TH [1,2]. CH is the most frequent inherited neonatal endocrine pathology, with an incidence of 1 in 1500 to 1 in 4000 live births, with frequent ethnic variations. CH due to variants of the TG gene has an estimated incidence of 1 in 67,000 to 1 in 100,000 newborns and showed an autosomal recessive inheritance pattern. [3,4]. The classic clinical spectrum varies from mild to severe hypothyroidism with congenital goiter. TG is a large dimeric protein that comprises more than half of all thyroidal proteins. Recently, the 3-dimensional atomic structure of human an bovine TG has been reported [5][6][7][8] To date, more than two hundred and ninety deleterious variants in the human TG gene have been reported associated to thyroid diseases, mainly to CH [9,10].
In the present study we report that the human p.Pro2232Leu TG variant, previously considered as a simple neutral polymorphism, causes significant intracellular retention in HEK293T cells and further confirm that the ChEL domain plays a key role in the intracellular trafficking of the TG. In addition, two new CH-associated compound heterozygous variants were identified.

Materials and methods
A detailed clinical and laboratory evaluation of patients from family G [11] and family M [12] has been reported previously.
A custom next-generation sequencing (NGS) panel targeting genes associated with thyroid dyshormonogenesis (SLC5A5, SLC26A4, DUOX2, DUOXA2, TPO, IYD, TG) and thyroid dysembryogenesis (PAX8, FOXE1, NKX2-1, TSHR) has been designed in order to amplify all exons and exon-intron junctions of the respective genes by multiplex PCR. Sequencing of these amplicon libraries was carried out using the MiSeq platform (Illumina, San Diego, CA). All variants detected were further validated by Sanger sequencing.
Amino acid sequence was compared between several ChEL domain of TG and several ACHE using the CLUS-TAL W (1.83) alignment (http://www.ch.embnet.org/softwa re/ClustalW.html). Single Nucleotide Variants (SNV) were analyzed with the sequence based predictors included in the VarSome tool [https://varsome.com]. The UCSF Chimera program was used to obtain the 3D model of the human TG (https://www.cgl.ucsf.edu/chimera/).
The prTG[p.Pro2233Leu] clone was generated from prTGWT [13] using QuikChange Lightning Site-Directed Mutagenesis kit (Agilent, Santa Clara, CA) following the manufacturer's recomendations. HEK293 cells were cultured in DMEM with 10% FBS in plates at 37°C in a humidified 5% CO2 incubator. Plasmids were transiently transfected using Lipofectamine 2000 (Thermo Fisher Scientific, Waltham, MA). Supernatants and cell lysates were analyzed by reducing SDS-PAGE (7% gel), followed by Western blotting. A rabbit monoclonal antihuman TG (ab156008, Abcam, Cambridge, UK) was used as primary antibodies. HRP-conjugated donkey anti-rabbit IgG antibody (NA934, GE Healthcare, Buckinghamshire, UK) was used as secondary antibodies. Images were captured in a C-Digit Blot Scanner via Image Studio Software (LI-COR, Lincoln, NE).

Results
In two families previously studied in our laboratory with CH and low serum TG levels, G and M [11,12], only one deleterious allele was detected after sequencing the entire coding region, promoter region, and exon-intron boundaries of the TG gene (family G, c.2359 C > T [p.Arg787*]; family M, c.5560 G > T [p.Glu1854*] (Fig. 1A). In order to study in depth this type of cases of monoallelic patients, we developed a custom NGS panel. The pathogenic variants in the TG gene that had already been detected by Sanger sequencing were also identified by NGS; a second pathogenic variant was not identified, nor in the rest of the 10 genes.
Sanger sequencing and NGS identified seven SNVs in family G (p.Ser734Ala, p.Pro778Pro, p.Pro2232Leu, p.Leu2470Leu, p.Trp2501Arg, p.Arg2530Gln and p.Tyr2640Tyr) and four SNVs in family M (p.Met1028Val, p.Pro2232Leu, p.Leu2470Leu and p.Tyr2640Tyr) ( Table  1). 13 of 21 variants previously reported in the TG gene were found in gnomAD v2.1.1 database with an allelic frequency greater than or equal to 1%. The remaining 8 SNVs (p.Gly77Ser, p.Asp142Asp, p.Pro777Leu, p.Gln830Glu, p.Ser1158Ser, p.Thr1498Met, p.Pro1302Pro, and p.Pro2232Leu) did not reach a high population frequency (Table 1). Both families presented seven of these 8 variants with the wild-type allele in homozygous form (Table 1). Interestingly, since the mutated leucine 2232 was present in heterozygous state in all four patients, it was decided to continue analyzing this variant in depth to assess whether or not it could be a disease-causing variant.
Multiple sequence alignment of the TG and ACHE sequences revealed that wild-type proline 2232 is strictly conserved in the TG and ACHE of all the analyzed species ( Supplementary Fig. 1).
To analyze the 14 previously identified non-synonymous SNVs, a bioinformatic prediction study was performed (Supplementary Table 1). The predictors exhibited as likely benign ten TG variants: p.Ser734Ala, p.Gly815Arg, p.Arg988Pro, p.Met1028Val, p.Asp1312Gly, p.Thr1498Met, p.Asp1838Asn, p.Arg1999Trp, p.Trp2501Arg, and p.Arg2530Gln, whereas the p.Pro777Leu and p.Gln830Glu were classified as variants of uncertain significance (VUS) and, p.Gly77Ser and p.Pro2232Leu as likely pathogenic. As can be seen in Supplementary Table 1, 18 of the 20 programs predict a deleterious effect of the p.Pro2232Leu variant.
The ChimeraX program was used to evaluate the impact of the p.Pro2232Leu on the 3D structures of the protein.
When analyzing the wild-type protein, it is observed that the proline 2232 is found in a more superficial area and near an   N-linked glycan asparagine 2250 . The proline 2232 was observed in a delimited area showing the presence of four hydrogen bonds (Fig. 1.B.a). The result of the p.Pro2232-Leu shows that the variant does not interfere with any of these four bonds. On the other hand, the p.Pro2232Leu variant shows the emergence of clashes, a non-physical Van der Waals overlap between amino acids tyrosine 2283 and tryptophan 2255 , suggesting that the leucine 2232 generates a structural accommodation between the affected amino acid and residues that surround it (Fig. 1.B.b). The structural rearrangement forced by leucine 2238 clearly affects the structure of TG, due to the cyclic and rigid conformation of proline, resulting in a clear structural change when it is replaced by leucine 2238 . To this is added the changes in the hydrophobicity profile of the mutated zone, which is slightly more hydrophobic than the wild-type zone ( Fig.  1.B.c).
As a last stage in the analysis of the p.Pro2232Leu sequence variant (p.Pro2233Leu in the rat TG), functional assays were carried out to assess whether or not the presence of this change could be affecting the intracellular transport of TG. Site-directed mutagenesis was performed using the vector containing rat TG cDNA (prTGWT) (Fig.  1.C.a). As observed in Fig. 1.C.b, the mutated clone prTG[p.Pro2233Leu] is retained inside the cell while the wild-type prTGWT and pmTGWT are found both inside the cell and in the extracellular medium. More precisely, Experiment 1 shows a complete lack of TG in the supernatant, while Experiment 2 only traces of TG in the supernatant ( Fig. 1.C.b). These results would be indicating that the p.Pro2232Leu sequence variant in the TG gene would be the cause of the retention of the protein inside the cell, preventing its transport to the extracellular medium.

Discussion
In this work, we identified in four patients from two unrelated families that the cause of their CH is due to deleterious variants in the TG gene [11,12]. Genetic analyzes indicate that patients from family G are compound heterozygous for p.Arg787* and p.Pro2232Leu variants and that patients from family M are compound heterozygous for p.Glu1854* and p.Pro2232Leu variants (Fig. 1).
Nonsense variants give rise to premature stop codons in the TG coding sequences, resulting in a truncated protein with limited functional capacity for TH biosynthesis. Both truncated forms detected in this study delete the ChEL domain. The p.Arg787* variant comprises only part of region I ( Supplementary Figure 2A), while p.Glu1854* includes regions I, II, and only part of region III (Supplementary Figure 2A). However, both the p.Arg787* and p.Glu1854* mutants still harbor the acceptor tyrosine 24 and the donor tyrosine 149 but delete the C-terminal hormonogenic domain. As a consequence, transcripts containing nonsense codons are rapidly degraded in the cytoplasm by the Nonsense-Mediated mRNA Decay (NMD) surveillance mechanism [14].
The main objective of this work was carry out a bioinformatic and functional analysis of the SNV p.Pro2232leu described previously in the ChEL domain of TG gene and identified in heterozygous state in both families analyzed ( Supplementary Figure 2A, B). It was demonstrated that this   variant causes intracellular retention in HEK293T cells. Consequently, we can affirm that the p.Pro2232Leu variant in the TG gene affects the folding of secretory proteins and their trafficking. The ChEL domain play a main role for the intracellular trafficking of TG, through the secretory pathway, from the thyroid cell to the cell-colloid interfaces [15,16]. It functions as an intramolecular chaperone and as a molecular escort for the remaining TG regions [15]. The ChEL domain is also essential for TG dimerization [16]. It is well documented that deletereous missense variants in the ChEL domain may cause TG retention in the endoplasmic reticulum [17,18]. Recently, Wright et al. [19] confirmed that deleterious variants located in the ChEL domain are retained intracellularly, also showing increased interactions with chaperones. On the other hand, Zhang et al. [20] demonstrated that thyroxine production in TG defects by deleterious variants located in the ChEL domain is synthesized from mutant TG retained intracellularly and released into the follicle lumen by dead thyrocytes. The released mutant TGs are cannibalized by the TH biosynthetic machinery of surrounding living thyrocytes.
In summary, we found that the p.Pro2232Leu variant by bioinformatic and functional have shown the high degree of pathogenicity. In this way, the p.Pro2232Leu variant associated with the previously identified nonsense variants, in the opposite allele, cause CH in the patients studied.

Data availability
Data and material are available from the authors upon request.