It is well established that genotype-phenotype shows a strong correlation in XLAS [6–8]. Jais et al reported that large deletions and nonsense mutations confer a 90% probability of end-stage renal disease (ESRD) by the age of 30 years old compared with a 70% risk with splice site mutations and a 50% risk with missense mutations [8]. Gross et al grouped men with XLAS into three groups as follows [7]. (1) Large rearrangements, frame shift, nonsense, and splice donor site mutations had a mean ESRD age of 19.8 ± 5.7 years. (2) Non-glycine or 3’ glycine missense mutations, in-frame deletions/insertions, and splice acceptor site mutations had a mean ESRD age of 25.7 ± 7.2 years, and (3) 5’ glycine substitutions had an even later onset of ESRD at a mean of 30.1 ± 7.2 years [7]. Bekheirnia et al reported the average onset of ESRD as 37 years old for those with missense mutations, 28 years old for those with splice site mutations, and 25 years old for those with truncating mutations [6]. Although these reports very clearly show genotype-phenotype correlations, all studies have grouped splice site mutations together without considering their diverse consequences for collagen transcripts. Kandai Nozu et al reported that 29% of men with XLAS showed significantly milder phenotypes, including milder proteinuria, later onset of ESRD, and less occurrence of hearing loss [9]. All of them had nontruncating mutations. From these results, it was suggested that in-frame mutations could show a milder phenotype, even if derived from a splice site mutation [9]. In this study, our patient had persistent haematuria, mild proteinuria, no sensorineural deafness and no ocular abnormalities, thus making for a relatively mild phenotype of male XLAS. However, the EM findings of the glomerular basement membrane (GBM) showed the typical abnormalities for AS such as irregular thinning and thickening and a diffuse basket-weave pattern, leading to a diagnosis of AS. Then we found an exon 47 c.4298-20T > A variant in COL4A5 gene in a patient. This variant is not included in the gnomAD database and has not been verified in vitro yet. The patient's parents had no symptoms and they had not undergone a gene test. We visited several bioinformatics platforms, including SpliceAI, dbscSNV_ADA, dbscSNV_RF, and varSEAK, to analyze the effect of the variant on the primary splicing site. Consistently, unusual splicing in the COL4A5 gene was suggested upon the occurrence of the variant. However, it is challenging to accurately identify the abnormal splicing sites and transcripts with a bioinformatics platform at all times. Therefore, for the variants that may affect splicing site and are detected by next-generation sequencing (NGS), it is significant to validate their authenticity [10, 11]. With transcript analysis, we can determine splicing site variant as either truncating or non-truncating variant, which will help with future analysis of genotype-phenotype correlations.
We applied an in vitro minigene splicing assay to detect the aberrant splicing caused by a variant in the COL4A5 gene. This method can easily detect the aberrant exonic or intronic splicing caused by a single-base substitutions [12, 13]. Here, we adopted this method in a XLAS case with a single-base substitutions caused by the c.4298-20T > A variant in the COL4A5 gene. Transcript analysis was not available in this case due to the low peripheral expression of COL4A5 gene. Thus, we established a vector to carry the promoter, exon 46, and exon 47 of COL4A5 gene. This vector was introduced to the prepared cells, and the transcripts produced were processed for reverse transcription–polymerase chain reaction (RT-PCR). The result demonstrated that the c.4298-20T > A variant preserved 18 bp from the intron 46 of COL4A5 transcripts, resulting in insertion of 6 amino acids behind the amino acid at position 1432 in α5(IV). Collectively, the non-coding sequences in the eukaryotic genome are composed of the non-coding regions and introns. The introns are removed during mRNA processing. Thus, no non-coding sequence (introns) is present in mature mRNA. In addition, the introns are nonsense for the structure of translation products and are free from the pressure of natural selection. Therefore, they are more prone to develop variants than exons. In the original and updated ACMG/AMP guidelines, no non-coding sequence variant (other than the typical splicing site variants) or splicing defect with deletion of one or more exon is documented, since the pathogenicity of no non-coding sequence variant is difficult to identify without experimental data. In the present study, the 18bp non-coding sequences of the intron46, which should be removed during normal transcription, were retained after a c.4298-20T > A variant in COL4A5 gene and then converted to coding sequences (exons). Therefore, we reassessed the pathogenicity of the c.4298-20T > A variant according to the ACMG guidelines. PVS1_Moderate: this splicing region variant results in a new splicing acceptor in intron 46, leading to a non-frame shift insertion of 18 bp (6 aa) at the beginning of exon 47; PP4: the clinical phenotypes were highly consistent with the single-gene hereditary disease caused by COL4A5 gene abnormalities; PP3: Bioinformatics software predicted the potential effect of the variant on gene splicing; PS3_Moderate: functional analysis demonstrated that the variant affected gene splicing, resulting in intron retention (18 bp); PM2_Supporting: the variant is rare and is not included in the gnomAD database. Combining the results, the base insertion caused by the c.4298-20T > A was assessed as likely-pathogenic.
α3(IV), α4(IV), and α5(IV) form a triple helix that combines tightly with other triple helices to form the GBM. If one of the three α chains becomes defective from a pathogenic variant of the encoding gene, the normally highly ordered GBM gradually breaks down, including the splitting of the lamina densa in GBM, which is referred to as the basket weave change. These changes accelerate the glomerular sclerotic changes and lead to kidney dysfunction. XLAS is caused by α5(IV) chains following a pathogenic variant in the protein-coding gene. Upon a mutation in the COL4A5 gene, there are two following cases: 1) complete loss or shortening of the α5-chain protein product; 2) a full-length protein product with amino acid substitution or insertion. The former case is easy to understand, as incomplete protein may not function normally thereby resulting in diseases. Additionally, it is generally believed that amino acid substitutions/insertions can lead to local kinks or abnormal folding without a triple helix structure, while the abnormally folded collagen molecules have increased sensitivity to protease, making them prone to degradation [14]. Therefore, the effect of COL4A5 gene mutations on the folding of its triple helix structure is critical to the severity of clinical phenotypes. Usually, a strong genotype-phenotype relationship is presented in male XLAS cases [6, 8, 15]. In terms of truncation variations (e.g., nonsense variation, and deletion/insertion), male XLAS cases present with complete negative expression of α5(IV), while female XLAS cases show chimeric α5(IV) expression [16–18]. For some non-truncation pathogenic variants (e.g., missense variation), α345(IV) trimer with a structure (not exactly matched with the normal structure) can be formed, and the α5(IV)-positive patients present with milder phenotypes than α5(IV)-negative patients [9, 19]. Splicing variants are more complex, commonly including deletion of exon (in whole or in part), retention of intron in whole (conversion from intron to exon), retention of intron in part (splicing happens in the sites, other than the splicing sites, with splicing site features in intron), depletion of multiple exons. These aberrant splicing variants can lead to consequences, such as deletion of amino acids, premature translation termination, and insertion of multiple amino acid sequences like the case reported in this study. To clarify the genotype-phenotype relationship in this patient, we applied to molecular dynamics simulation to analyze the effect of the COL4A5 gene variation on the capability of protein to form a triple helix [20]. Homology modeling was employed to construct a three-dimensional structure for the α345 (IV) trimer.
The COLA45 gene contains 51 exons. The sizes of exons of COL4A5 (5'- and 3’- untranslated sequences not included) vary between 27 and 213 bp. Exon 1 contains 283 bp with 202 bp of a 5'-untranslated sequence and 81 bp of a translated sequence. The translated sequence of exon 1 encodes solely the tentative 26-residue long signal peptide. Exon 2 encodes the 14 non-collagenous amino-terminal end and two Gly-X-Y triplets. Thus, exons 2–47 encode the collagenous domain, exon 47 being a junction exon encoding the carboxyl- terminal end of the collagenous domain and a part of the non-collagenous domain. In our study, c.4298-20T > A variant preserved 18 bp from the intron 46 of COL4A5 transcripts, resulting in insertion of 6 amino acids behind the amino acid at position 1432 in wild α5(IV), that is, the additional DYFVEI added between exon 46 and exon 47 is located at the tail of the α5 chain. We carried out full atom molecular dynamics simulations of the mutant α5 chain via a 100 ns kinetic simulation process. RMSD and RMSF results show that the fluctuation of the 1430–1600 residues in the mutant α5 chain significantly increased due to the insertion of DYFVEI part as well as the fluctuation of the 900–1150 residues in the mutant α5 chain. The molecular dynamics results show that the c.4298-20T > A variant affects not only the tail stability of the α5 chain, but also the stability of the middle part of the α5 chain. Then we carried out full atom molecular dynamics simulations of the mutant α345(IV) trimer via a 100 ns kinetic simulation process. RMSD and RMSF results show that the fluctuation of the head and end parts as well as the 700–1400 residues in the mutant α345(IV) trimer significantly increased due to the insertion of DYFVEI part. Molecular dynamics results showed that the c.4298-20T > A variant affected not only the tail stability of α345(IV) trimer, but also the stability of the head and middle of α345(IV) trimer. With this method, the triple helix structure of mutant α345(IV) was well simulated, and the changes in the trimer were monitored with the variant information. Taken together, these simulations suggest that the mutant α345(IV) trimer following an intron 46 splicing variant in COL4A5 gene, with the structure not completely collapsed, but the structure of mutant α345(IV) trimer changes greatly, and mutation has a great impact on their configuration. Whether molecular dynamics simulation can help reflect the clinical severity of pathogenic variants is now in progress in our laboratory.