Number of the common variants creating m6A motif inside G4 structure are higher than the rare variants
Since, both m6A and G4 found in pre-mRNA are known to affect splicing, localization, translational efficiency and decay of the RNAs, variants in non-spliced sequence of the selected MIM genes were evaluated in terms of their effect on m6A motif Number within putative G4 structures. While most of the variants (94,6%, 50.933 variants) were found not to relate to m6A motif (n → n; inert variants), 75% (2009 variants) of the remaining created a novel m6A motif (n → n+1; augmentative variants), and 25% (851 variants) abolished a preexisting m6A motif (n→ n-1; reductive) inside a putative G4-structure (Supplementary File 1). It was found that the number of rare variants which create a novel m6A motif or increase the number of preexisting m6A motifs inside a G4 structure were statistically lower (p=3.02e-6) than that of the common variants (Table 1).
To clarify whether this situation is limited to G4-included regions, the variants which have a flanking region that is not included in any putative G4 (G4-not-included regions) were analyzed. Contrary to G4-included regions, the Number of rare augmentative variants, which creates a novel m6A motif or increases the number of preexisting m6A motifs inside G4-not-included regions, were found statistically higher (p=6.98e-30) than that of common variants (Table 1).
Table 1
Count of variants which decrease (n -> n-1), increase (n -> n+1) or does not change (n -> n) the m6A motif number (n) inside or outside the G4 structure in pre-mRNA. Rare variants (MAF<0.01) causing increase of m6A motif number were significantly lower than expected while inside the G4 structure, and higher than expected while out of the G4 structure.
m6A motif location
|
m6A motif number
|
MAF value
|
Variant count
|
Chi-square
|
Reference allele
|
Variant allele
|
Observed
|
Expected
|
Inside G4-motif
|
n
|
n-1
|
<0.01
|
509
|
529.913
|
0.82
|
≥0.01
|
763
|
742.087
|
0.58
|
n
|
n
|
<0.01
|
31814
|
31661.47
|
0.73
|
≥0.01
|
44186
|
44338.53
|
0.52
|
n
|
n+1
|
<0.01
|
1174
|
1305.619
|
13.26 *
|
≥0.01
|
1960
|
1828.381
|
9.47 *
|
Outside G4-motif
|
n
|
n-1
|
<0.01
|
994
|
919.87
|
5.97
|
≥0.01
|
7913
|
7987.12
|
0.68
|
n
|
n
|
<0.01
|
23410
|
23836.62
|
7.63
|
≥0.01
|
207396
|
206969.37
|
0.87
|
n
|
n+1
|
<0.01
|
1516
|
1163.5
|
106.79 **
|
≥0.01
|
9750
|
10102.49
|
12.29 **
|
* p=3.02e-6, total chi-square: 25.41, df=2, ** p=6.98e-30, total chi-square: 134,26, df=2
Since both m6A modification and G4 structures are involved in translational rate, localization and stability of mRNA, variants were evaluated in spliced RNA sequence, as well. While the difference between rare and common variants in G4-not-included regions remained statistically significant (p<0.001), there was no statistical significance in distribution of variant counts inside G4 (Table 2).
Table 2
Count of variants which decrease (n -> n-1), increase (n -> n+1) or does not change (n -> n) the m6A motif number (n) inside or outside the G4 structure in mRNA. Rare variants (MAF<0.01) causing increase of m6A motif number were significantly lower than expected while inside the G4 structure, and higher than expected while out of the G4 structure.
m6A motif location
|
m6A motif number
|
MAF value
|
Variant count
|
Chi-square
|
Reference allele
|
Variant allele
|
Observed
|
Expected
|
Inside G4-motif
|
n
|
n-1
|
<0.01
|
0
|
1.09
|
0.128
|
≥0.01
|
13
|
11.9
|
1.395
|
n
|
n
|
<0.01
|
71
|
68.2
|
7.995
|
≥0.01
|
737
|
739.79
|
86.728
|
n
|
n+1
|
<0.01
|
1
|
2.73
|
0.320
|
≥0.01
|
31
|
29.29
|
3.434
|
Outside G4-motif
|
n
|
n-1
|
<0.01
|
695
|
655.42
|
2.39
|
≥0.01
|
5443
|
5482.57
|
0.28
|
n
|
n
|
<0.01
|
12698
|
12857.69
|
1.98
|
≥0.01
|
107714
|
107554.3
|
0.23
|
n
|
n+1
|
<0.01
|
756
|
635.88
|
22.69 *
|
≥0.01
|
5199
|
5319.11
|
2.71 *
|
* p<0.0001, total chi-square: 30.299, df=2
Rare variants creating m6A motif inside G4 avoid being close to 3’-side and prefer to be near 5’-side of pre-mRNAs
Independently of their frequency, both inert and reductive variants displayed an equal distribution throughout pre-mRNAs. Distribution of the augmentative rare variants displayed a statistically significant shift from 3’-end to 5’-end of the pre-mRNAs (Fig 1).
Because the total length and exon-intron contents of each transcript are different, relative position rather than absolute position of the variants is more appropriate for the comparison of their position-dependent effect in pre-mRNAs. To investigate the position-dependent effect of the variants on m6A-G4 colocalization in more detail, their relative positions were clustered as positional quartiles. Distribution of the rare variant ratio through positional quartiles revealed that ratio of inert variants displayed an equal distribution throughout pre-mRNAs. Except for the second quartile, the ratio of reductive variants also displayed an equal distribution. In the second quartile, the ratio of reductive rare variants was found significantly lower than expected. The ratio of augmentative variants displayed unequal distribution in all quartiles. Except for the last quartile, and especially in the first quartile, the ratio of augmentative rare variants was higher than expected. In the last quartile, however, it was lower than expected (Fig 2).
To test whether there is a difference between the distributions of variant positions throughout G4 structure, relative positions of the variants in G4 sequence were compared. There was not found any difference between over-G4 distributions of the variants (Data not shown.).
Colocalization of m6A and G4 seems to increase the thermodynamic stability of the G4 structure
Because of the crucial involvement of thermodynamic stability in G4-structure formation, minimum free energy (MFE) values of the putative G4-structures were compared considering the variants and m6A motif numbers.
Rare variants were shown to have lower MFE value, especially in reductive variants (Fig 3a). When reference and variant bases were taken into account, it was observed that the alternative variant had lower MFE than the reference allele in each group (Fig 3b). While reductive rare variants were found to have lowest MFE, augmentative common variants were found to have highest MFE value (Fig 3a). To test whether this result was due to the presence of the m6A motif, MFE values of G4 sequences were compared in terms of m6A motif number. MFE values were observed to be inversely related to m6A motif count (Fig 3c). Putative G4 structures with three m6A motifs were found to have lowest MFE values (Fig 3d).
Overlapping m6A-G4 may modulate G4 stability in a position-dependent manner
Due to position-dependent colocalization of m6A-G4 and m6A-dependent stability of G4, we can suppose that G4-stability may also be modulated with a position-dependent manner. Additionally, because of the relationship between overlapping m6A-G4 and allele frequency, it is possible that G4-stability may depend also on allele frequency of the variants. In order to test these possibilities MFE values of putative G4 structures were compared considering their relative position and minor allele frequency of the variants that create m6A motif.
Distribution of MFE values over positional quartiles has shown that MFE values were prone to decrease from 5’-end to 3’-end of the pre-mRNAs (Fig 4). To see whether this position-dependent decrease depends on variant alleles, MFE values were evaluated in terms of alleles. Effect of the variant alleles on MFE values was found to depend on their relative position (Fig 5a). Besides, the difference between MFE values of variant allele and reference allele also showed a dependence on relative position (Fig 5b). The MFR-difference (Variant allele MFE minus reference allele MFE) value had a decreasing pattern throughout pre-mRNA. When the distribution of MFE-difference was reevaluated considering the frequency of variant alleles, it was revealed that the decrease in MFE-difference was started closer to 5’-end with common alleles (Fig 5c), and closer to 3’-end with rare variants (Fig 5d).
Real Data Analysis
To test the validity of the findings, two experimental G-quadruplex data were analyzed. First of these data included 13,423 genomic coordinates of G-quadruplex regions yielded from HeLa human transcriptome study dataset, GSE77282 (37). The second data were fetched from the GSE181373 dataset that obtained single-cell mapping of DNA G-quadruplex structures in three human cancer cell lines (K562, MCF7 and U2OS), and included totally 223,696 genomic regions (38). All SNPs and their flanking sequences located inside these genomic coordinates were fetched using USCS Browser (https://genome.ucsc.edu/).
First data included 13,360 unique genomic regions in 6,613 genes. Totally 15,169 known variants were found within these regions. When the regions with extreme length (median length: 11,657 bp, min length: 10 bp, maximum length: 110,841 bp) were filtered out, 823 regions covering at least one variant were yielded (Supplementary File-2). Of these regions, only 110 (13.36%) were found to have classical G4-quadruplex motifs (G1-3[N1-7G1-3]3). G4-quadruplex motif-included sequences were evaluated in terms of DRACH (m6A) motif number with variant alleles. Approximatly 10% of them were found to have changed DRACH motif number; From 0 to 1: 21 variants, from 1 to 2: 13 variants, from 2 to 3: 2 variants, from 1 to 0: 15 variants and from 2 to 1: 4 variants.
Second data included 223,696 genomic regions (198,959 of which were unique). Of these experimentally detected quadruplex regions, which had 2,939.46 bp length in mean (median 1,766 bp), 1,5487 (43,5%) were found to have at least one classical G4-quadruplex motif. Though 34,482 variants (35,603 alleles) were identified to be covered by these regions, only 406 (1,14%) were located inside a classical G4-quadruplex motif (Supplementary File-3). While 81 variants were found to remove a pre-existing classical G4-quadruplex motif, 117 were observed to create a novel classical G4-quadruplex motif. Of the remaining 299 variants, which do not affect the G4-motif, 288 did neither change the m6A-motif number within the G4-quadruplex motif. Only six variants were found to increase the m6A-motif number within the G4-quadruplex motif, while five variants decreased the m6A-motif number.
When the allele frequency was taken into account, no significant difference was found in the distribution of the m6A motif counts (Supplementary Table-1). On the other hand, statistically significant relationships observed in the data yielded from the presumed G4-motifs were not able to test in the real data, because the count of rare variants were not sufficient (Supplementary File-2 and Supplementary File-3)