Mutation Prole of Collagen VI-Related Myopathy in Japan

Background: Collagen VI-related myopathy spans a clinical continuum from severe Ullrich congenital muscular dystrophy to milder Bethlem myopathy. This disease is caused by mutations in COL6A1, COL6A2, or COL6A3. Most reported mutations are de novo; therefore, to identify possible associated mutations, comprehensive large cohort studies are required for different ethnicities. Methods: We retrospectively reviewed clinical information, muscle histology, and genetic analyses from 147 Japanese patients representing 130 families, whose samples were sent for diagnosis to the National Center of Neurology and Psychiatry between July 1979 and January 2020. Genetic analyses were conducted by gene-based resequencing, targeted panel resequencing, and whole exome sequencing, in combination with cDNA analysis. Results: Of a total of 130 families with 1-5 members with collagen VI-related myopathy, 120 had mono-allelic and 10 had bi-allelic variants of COL6A1, COL6A2, or COL6A3. Among them, 60 variants were in COL6A1, 57 in COL6A2, and 23 in COL6A3, including 37 novel variants. Mono-allelic variants were classied into four groups: missense (69, 58%), splicing (40, 33%), small in-frame deletion (7, 6%), and large genomic deletion (4, 3%). Variants in the triple helical domains accounted for 88% (105/120) of all mono-allelic variants. Conclusions: We report the mutation prole of a large set of Japanese cases of collagen VI-related myopathy. This dataset can be used as a reference to support genetic diagnosis and mutation-specic treatment. genetic analysis including cDNA analysis, and to correlate the ndings with immunostaining for collagen VI on muscle biopsies.

Genomic DNA was isolated from peripheral blood lymphocytes or muscle specimens using standard techniques. All exons and their anking intronic regions in COL6A1, COL6A2, and COL6A3 were ampli ed and sequenced directly in 52 families using an ABI PRISM 3130xl Genetic Analyzer (Applied Biosystems, Waltham, MA). Sixty-ve families were analyzed using the target resequencing panel for muscular dystrophy because we developed a method for screening gene mutation in our laboratory since 2014 using Ion PGM NGS [27]. Thirteen families were analyzed by whole exome sequencing because they were initially suspected of having other types of muscular disease.
The splice site-creating variant Chr21:47,409,881 C>T in intron 11 of COL6A1, was manually screened by the Sanger method [15]. cDNA analysis Total RNA was extracted from frozen muscle using a Total RNA Kit (Nippon Gene, Tokyo, Japan) and cDNA was synthesized with oligo (dT) 20 primer using SuperScript IV Reverse Transcriptase (Thermo Fisher Scienti c, Waltham, MA) using standard techniques [8].

Identi cation of pathogenic variants
Novel pathogenic variants were identi ed using a previously described method [27] with modi cations. Brie y, the likely pathogenic variants were de ned according to the following criteria: (1) a glycine substitution in the THD; (2) causes exon skipping in the THD; (3) a large genomic deletion; (4) produces a nonsense codon or small insertion/deletion causing a premature stop codon in patients with bi-allelic variants; (5) a missense variant (except a glycine substitution or a substitution outside the THD). If outside the THD, the predicted amino acid substitution was a) predicted to be pathogenic by more than one in silico tool (PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), MutationTaster (http://www.mutationtaster.org/), or CADD (http://cadd.gs.washington.edu/)), and/or b) co-segregated with the phenotype within a family. Missense variants were ltered with an allele frequency threshold of <0.01 in gnomAD (https://gnomad.broadinstitute.org/), NHLBI GO Exome Sequencing Project (http://evs.gs. washington.edu/EVS/), or the integrative Japanese Genome Variation Database (https://ijgvd.megabank. tohoku.ac.jp). The variants identi ed by target resequencing or whole exome sequencing were con rmed by Sanger sequencing.

Results
We identi ed pathogenic variants in a total of 130 families with collagen VI-related myopathy, which represented 1-5 members per family, seen at the National Center of Neurology and Psychiatry (NCNP) between July 1979 and January 2020, among them 120 families carried mono-allelic and 10 bi-allelic pathogenic variants (Table 1). One hundred and forty variants were identi ed, including 37 novel variants in 40 families, and these consisted of 60 allelic variants in COL6A1, 57 allelic variants in COL6A2, and 23 allelic variants in COL6A3 (Fig. 1). In 94 families with a mono-allelic variant, this was sporadic without family history (94/130, 72%). Among the 37 novel variants, we identi ed 24 missense variants, six splicing variants, three small in-frame deletions, three large deletions, and one nonsense variant (Fig. 2).
Among the ten families with bi-allelic variants, in eight the variants were in COL6A2, while the other two each had variants in COL6A1, or in COL6A3. Six of these ten families had variants producing a premature termination codon or causing aberrant splicing, which leads to in-frame exon skipping in both alleles, and all had UCMD phenotypes. One of the ten families, #66, had a nonsense and a missense variant and also exhibited a UCMD phenotype. The affected individuals of the remaining three families had single nucleotide variants causing non-glycine substitutions and all showed BM phenotypes, although family #68 had a 26 bp-deletion causing a premature termination codon in one allele.
Three novel heterozygous multiple exon deletions were detected in four families (Fig. 3). The deletions spanned from exon 5 to exon 8 in COL6A1 (Family #3 and #4), from exon 8 to exon 10 in COL6A1 (Family #5), and from exon 8 to exon 10 in COL6A2 (Family #87). All these large deletions were in-frame and distributed in the THD.
We performed immunostaining for collagen VI in muscle biopsies from 125 affected individuals in 123 families. In 115 patients with a mono-allelic variant, 91% (92/101) with the variant within and 71% (10/14) with the variant outside the THD showed SSCD. Even the biopsies from families harboring multiple exon deletions showed the typical SSCD staining pattern, suggesting dominant-negative effect of those variants (Fig. 4). Among the ten families having biallelic variants, ve showed a CD pattern, while the ve families carrying missense variant(s) showed a SSCD or a normal pattern. Observation at high magni cation using immuno uorescence staining revealed trace amounts of extracellular collagen VI in the muscle biopsies of three families with CD (Family #64, #67, and #109), while collagen VI was retained within the mesenchymal cells in two families (#61 and #62; Fig. 4i-m).
We reviewed all available muscle imaging data (34 families). At least one of three typical ndings in collagen VI-related myopathy (tigroid pattern in the vastus lateralis; target sign in the rectus femoris; a hyperintense rim between the soleus and gastrocnemius) [16] was seen in 85% (29/34) of the families. Among 29 families had mono-allelic variants in the THD, 86% (25/29) of these had typical imaging ndings. Of the remaining ve families, 3/4 (75%) with a variant outside the THD and the only family with a bi-allelic variant also showed typical imaging ndings.
We have elucidated the mutation pro le of collagen VI-related myopathy in Japan (Table 1). Furthermore, we report 37 novel variants in 40 families, comprising 24 missense, six splicing, three small in-frame deletion, three large genomic deletion, and one nonsense. From the genetic information, we have established the mutation pro le of the largest cohort at a single center as far as we are aware. The majority of the variants were mono-allelic (86%, 120/140), and 67% (94/140) of them were likely to be de novo because the parents of the patients were not affected, as has previously been described [6,9,10,17,18]. Therefore, our mutation pro le may be useful as a reference for diverse ethnicities. Given that all cases with collagen VI-related myopathy in this cohort were sent to our center from hospitals in Japan, we calculated the occurrence of severe UCMD in Japan as 2.26 cases per year, which is an estimated incidence of 0.19 in 100,000 births, similar to that found for northern England (0.13/100,000) [4]. This is most likely because in both ethnicities the majority of variants are de novo.
Among the mono-allelic variants, 88% (105/120) were located in the THD. Mono-allelic variants in the THD must be primarily associated with the majority SSCD staining pattern (91%, 92/101) and UCMD phenotype (90%, 94/105), whilst mono-allelic variants outside the THD were also associated with SSCD (71%, 10/14) but a BM phenotype (93%, 14/15). In exceptional cases, genotypes cannot be associated with speci c phenotypes, with some variants reported to cause both UCMD and BM phenotypes [9][10][11]18]. In fact, in our cohort, the families with c.877G>A in COL6A1, c.856-2A>G in COL6A2, or c.943G>A in COL6A2 showed a wide range of phenotypes from milder BM to severer UCMD, while conversely the variation in phenotypes of families with c.956A>G or c.1022G>A in COL6A1 was quite narrow and on the border between UCMD and BM.
In addition, we found four heterozygous large deletions in families with UCMD phenotype. All the deletions were located in the N-terminal side of the cysteine residue important for the assembly of the collagen VI tetramer. This is in accordance with all the reported multiple exon deletions [12,14,[19][20][21][22]. Intriguingly, the deletion in the region containing the cysteine residue caused relatively mild phenotypes in our cohort and in those of previous reports [6,23]. Thus, collagen VI proteins with large genomic deletions, which have the deletions no more than amino acid residues may act in a dominant-negative fashion and show a UCMD phenotype.
In this study, we identi ed ten families having biallelic variants and ve each of them showed CD and SSCD collagen VI staining patterns in muscles, respectively. We can presume that families with truncated variants in both alleles will be associated with CD and severe UCMD phenotypes, whilst those with missense variants or in-frame deletions at least in one allele will be associated with SSCD and milder BM phenotypes. In fact, three families with truncated variants in both alleles (CD) and ve families with missense or in-frame deletion at least in one allele (SSCD) displayed compatible patterns with the aforementioned presumption, regardless of causative genes. Interestingly, the other two biallelic families had in-frame deletion(s) in one and in two alleles, but they showed CD and severe UCMD phenotypes. To explore the mechanism causing the loss of collagen VI in muscles in these families, we observed the trace of collagen VI remaining in their biopsied muscles. In muscles from patients with truncated variants in both alleles, collagen VI formed small deposits in the extracellular space, while in patients with an in-frame deletion in at least one allele, the collagen VI was retained within mesenchymal cells. Thus, in those cases with extracellular deposits visible, the truncated collagen VI molecules could form tetramers and be secreted, but the secreted collagen VI was unstable and degraded extracellularly. On the other hand, in the cases with a retained trace, the in-frame deleted molecules failed to make a tetramer and be secreted. Additional detailed molecular analyses are required to understand the precise mechanism.
Our results provide comprehensive information on mutation type, incidence, and their consequent effect, performing the role of a 'mutation catalog' that thereby replaces the multiple analyses. In the cases with splicing-site variants, intronic pseudoexon-creating variants, and genomic large deletions, which account for about one third of the total, cDNA analysis contributed to successfully identifying the variant. cDNA analysis was especially powerful in identifying intronic pathogenic variants (8%: 11/140) located outside of exon-intron borders and which led to cryptic splicing. Using a combination of genomic and transcript analyses with the collagen VI pathology observed in muscles, we were able to make a conclusive genetic diagnosis in 130/132 cases with suspected collagen VI myopathy clinicopathologically. However, multiple steps of analyses were required to reach the nal genetic diagnosis. Because there are a large number of sporadic cases with a de novo collagen VI variant in this disease, our comprehensive mutation catalog together with mutation reports from other published cases may help genetic diagnosis in diverse ethnicities.

Conclusion
Our report provides a large mutation catalog of collagen VI-related myopathy in Japan, which can be used as a reference for genetic diagnosis and will also be helpful in mutation-speci c therapy in the future. The majority of causal variants of collagen VI-related myopathy were mono-allelic de novo, and most of them were located in the THD and associated with SSCD and UCMD phenotypes.

Declarations
Ethics approval and consent to participate All clinical information and materials used in the present study were obtained for diagnostic purposes with written informed consent. The study was approved by the Ethics Committee of the National Center of Neurology and Psychiatry (NCNP).
Not applicable.

Competing interests
The authors declare that they have no competing interests.   brothers; j 83-2 is the mother of 83-1; k brothers; l brothers; m 117-2 is the mother of 117-1; n two variants on one allele.