Germline mutations directions are different between introns of the same gene: case study of the gene coding for amyloid-beta precursor protein

Amyloid-beta precursor protein (APP) is highly conserved in mammals. This feature allowed us to compare nucleotide usage biases in fourfold degenerated sites along the length of its coding region for 146 species of mammals and birds in search of fragments with significant deviations. Even though cytosine usage has the highest value in fourfold degenerated sites in APP coding region from all tested placental mammals, in contrast to marsupial mammals with the bias toward thymine usage, the most frequent germline and somatic mutations in human APP coding region are C to T and G to A transitions. The same mutational AT-pressure is characteristic for germline mutations in introns of human APP gene. However, surprisingly, there are several exceptional introns with deviations in germline mutations rates. The most of those introns surround exons with exceptional biases in nucleotide usage in fourfold degenerated sites. Existence of such fragments in exons 4 and 5, as well as in exon 14, can be connected with the presence of lncRNA genes in complementary strand of DNA. Exceptional nucleotide usage bias in exons 16 and 17 that contain a sequence encoding amyloid-beta peptides can be explained either by the presence of yet unmapped lncRNA(s), or by the autonomous expression of a short mRNA that encodes just C-terminal part of the APP providing an alternative source of amyloid-beta peptides. This hypothesis is supported by the increased rate of T to C transitions in introns 16–17 and 17–18 of Human APP gene relatively to other introns.


Introduction
Calculation of biases in nucleotide usage combined with the comparison of actual nucleotide mutations rates is a powerful instrument of the analysis of molecular evolution of a given gene or even genome (Du et al. 2018). Nucleotide usage biases in genomes of prokaryotes are thought (Du et al. 2018) to be mostly the consequences of unequal rates of different nucleotide mutations occurrence (i.e. of the mutational pressure). All possible nucleotide mutations in fourfold degenerated sites of protein coding regions of genes do not cause any changes in the amino acid sequence of encoded polypeptide (Guo et al. 2012). However, they still may cause serious consequences for RNA splicing, for RNA interference, for the binding of transcription factors by DNA, for secondary structure of mRNAs, as well as for efficiency of mRNA translation (Cristillo et al. 2001;Dana and Tuller 2014;Letzring et al. 2010). All abovementioned factors seem to play more important roles in nucleotide usage biases formation in case if we deal with a prokaryotic genome of an average GC-content, when mutational pressure is not too strong. If GC-content is rather high or rather low (when mutational pressure is strong) nucleotide usage biases in all the genes are becoming quite uniform (Khrustalev et al. 2012).
In genomes of eukaryotes, in contrast to genomes of prokaryotes, nucleotide usage biases vary greatly along the length of a chromosome (Bernardi 2000), and along the length of a gene (Khrustalev et al. 2014). That is why parts of eukaryotic chromosomes with the same GC-content once were called "isochores" (Bernardi 2000), while parts of a gene with the same GC-content were later called "intrachores" (Khrustalev et al. 2014).
There are two points of view on the occurrence of nucleotide usage biases. The first one (a teleologic one) states that the occurrence of a bias serves for a certain purpose, and so the bias itself is the product of positive natural selection (Forsdyke 2021). A well-known sample of such interpretation is that GC-content of genomes increases in response to the high temperature of the environment with the aim to stabilize the structure of DNA (Hurst and Merchant 2001). Nowadays many thermophilic bacteria and archaea with average or even low GC-content are known, as well as mesophilic prokaryotes with high genomic GC-content (Hurst and Merchant 2001;Wu et al. 2012). So, the abovementioned connection between the high temperature of the environment and the high GC-content of DNA does not work every time: the increase of GC-content does lead to the stabilization of the DNA duplex, while this factor is not absolutely necessary for prokaryotes that can survive at high temperatures. The second point of view states that biases in nucleotide usage have multiple and usually controversial consequences for the fitness, and they exist because of the imbalance in occurrence rates of different nucleotide mutations (Sueoka 1993;Wu et al. 2012). According to that point of view, nucleotide usage biases are consequences of several biochemical processes, and they are not dictated by positive natural selection, but rather fixed by random genetic drift (Kimura 1989). Using the "mutationist" point of view, one can find causes of the occurrence of "intrachores" in eukaryotic genes. One of the most obvious causes of their existence is the autonomous transcription of certain fragments of a gene or transcription of genes from the complementary strand of DNA (Khrustalev et al. 2014).
In multicellular organisms only germline mutations can be inherited. It means that germline mutations can happen on any stage of ontogenesis on the way from parental zygote to daughter zygote. This way requires specific changes in gene expression patterns for almost each new generation of cells. The rates of nucleotide mutations depend on the strength of the oxidative stress and on the ability of repair system to fix lesions of nitrogenous bases (Gros et al. 2002). Oxidation and deamination of those nitrogenous bases in DNA are much more frequent when DNA is unwound and the duplex is dissociated (Frederico et al. 1990). Single-stranded DNA appears during replication and transcription. It means that genes transcribed in certain period of ontogenesis accumulate nucleotide mutations with a certain preferable direction, and that direction is formed by the gene expression pattern. According to the hypothesis written above, autonomously expressed genes of short and long RNAs overlapping with exons of proteincoding genes can cause deviations in the nucleotide usage bias along the length of coding regions of protein-coding genes.
In some species of bacteria (Khrustalev et al. 2015), as well as in viruses (Khrustalev et al. 2020a, b), autonomous expression of certain parts of a coding region leads to drastic changes in nucleotide usage biases inside them. For example, in adeno-associated viruses, the presence of an alternative transcription start site inside the coding region is able to cause transcription of a shorter RNA that lacks its 5′-end sequence, and to the change in nucleotide usage bias in that autonomously transcribed region (Khrustalev et al. 2020a, b). Similar kind of autonomous transcription may also be responsible of the occurrence of deviations in nucleotide usage biases along the length of a coding region of a eukaryotic gene.
Nucleotide usage bias is a retrospective index (Khrustalev et al. 2020a, b). It takes a long time for random genetic drift to fix changes even in fourfold degenerated sites (Khrustalev et al. 2014). So, the study on nucleotide usage biases is the study on history of mutagenesis. To find preferred directions of current mutational process one has to use the data on nucleotide mutations stored in public data bases (Howe et al. 2021). From this point of view, it is quite interesting to compare the current direction of mutagenesis in a gene of Homo sapiens with its historical direction and to check the intriguing hypothesis of autonomous expression of its parts at early steps of embryogenesis.
The number of nucleotide mutations has grown in public data bases in recent years (Howe et al. 2021). So, nowadays it is possible to obtain trustful results on their occurrence rates and on their preferable directions. The presence of significant bias in occurrence rates of mutations can be considered as the direct evidence of the mutational pressure existence in a human gene. However, even those germline mutations are a subject of "initial" negative selection. It means that carriers of absolutely lethal mutations are not sequenced and such mutations are not getting into public data bases, unlike neutral mutations and negative mutations that decrease the fitness or increase the probability of the development of a disease, but do not cause the critical impairment of embryogenesis.
Mutations in introns were traditionally considered to be neutral. However, recently a lot of regulatory elements and genes coding for long and small RNAs are known that are situated in introns of genes. In this study we calculated the rates of different nucleotide mutations in each intron of the same gene. Since the total number of single nucleotide mutations in introns of that gene was equal to 69,341, the obtained data are meaningful, and there is a need to discuss possible causes of the existence of differences in mutational pressure directions between different introns of the same gene.
The gene that is tested in this study is the one coding for amyloid-beta precursor protein (APP). APP is a large highly conserved transmembrane protein with numerous functions that is known to be expressed during embryogenesis (Gadhave et al. 2020). The length of its extracellular part is equal to 684 amino acid residues. A single alpha-helical transmembrane domain (21 residues) is continued by the intracellular domain (48 residues). In the middle of the extracellular domain there is a disordered part of the protein (residues 194-284) enriched by residues of Asp, Glu, and Thr (Gadhave et al. 2020). It is known that APP protein is expressed in oocytes and embryos of mouse at early stages of development (Fisher et al. 1991). In zebrafish there are two homologs of Human APP gene, and both of them are expressed during the embryonic development (Musa et al. 2001) and involved in cell adhesion (Banote et al. 2020). Human APP is expressed ubiquitously, but is thought to be most important for neural tissue development and functioning (Porayette et al. 2009). It has been proven to be a "synaptogenic" protein which induces presynaptic or postsynaptic differentiation when presented to axons or dendrites, respectively (Baumkötter et al. 2014). Expression of different variants of APP has been detected during early human embryogenesis prior to the formation of neural precursor cells (Porayette et al. 2007).
Processing of the APP may follow two pathways: nonamyloidogenic and amyloidogenic (Lichtenthaler and Haass 2004). The non-amyloidogenic pathway includes proteolysis by alpha-secretase in the position 687-688 leading to the formation of secreted extracellular domain. The amyloidogenic pathway results in the formation of shorter secreted extracellular domain and beta-amyloid peptides (672-711 and 672-713), since beta-secretase cuts the chain in 671-672 position. In both cases gamma-secretase cuts the transmembrane domain in positions 711-712, 713-714, and 720-721. Beta-amyloid peptides were shown to be toxic for differentiated neurons, but they function as a neurotrophic factor for differentiating neurons (Yankner et al. 1990). It means that beta-amyloid peptides should be necessary for correct development and differentiation of neural cells during early steps of embryogenesis (Yankner et al. 1990), but they also play a central role in Alzheimer disease pathogenesis (Lichtenthaler and Haass 2004), as well as in Down syndrome (Head and Lott 2004) and other neurodegenerative diseases (Gupta et al. 2016).
In the current study we found out that introns of APP gene situated around exons encoding beta-amyloid peptides demonstrate specific deviations from the overall direction of mutational pressure. Such deviations are interpreted in light of the hypothesis of the autonomous transcription of truncated APP leading to the production of beta-amyloid peptides without any involvement of beta-secretase.

Materials and methods
As the material for this study we used nucleotide sequence of the APP (beta-amyloid precursor protein) gene of Homo sapiens from the Ensemble (Howe et al. 2021) data base (ENSG00000142192). Using the coding region of the transcript variant APP-201 of that gene in the NCBI-BLAST analysis (in August 2021) we found 128 more sequences from placental mammals, 4 sequences from marsupial mammals, and 13 sequences from birds that code for the same transcript variant of beta-amyloid precursor protein and contain a minimal number of gaps. Sequences were aligned by the PAM method with a help of MEGA 11 program (Tamura et al. 2021). Final alignment of amino acid sequences contains just 9.1% of gaps (65 out of 776 positions). The percent of absolutely invariable positions in the amino acid alignment is equal to 68.8% (534 out of 776 positions). The percent of sites that contain the same amino acid in 98.6% of sequences is equal to 80.4% (624 out of 776 positions). The alignment itself can be found in the Supplementary Material file.
To show the species included in the study and to highlight the quality of used sequences we have built a phylogenetic tree using minimum evolution method based on LogDet evolutionary distances ( Fig. 1) calculated with the same MEGA 11 program (Tamura et al. 2021). The conventional branching of that tree (Hedges et al. 2015) shows that there were no pseudogenes or other questionable sequences used in this study.
Nucleotide usage in fourfold degenerated sites has been calculated for each studied sequence in windows of 150 codons in length with a step of one codon with the help of the VVTAK SW (chemres.bsmu.by) algorithm (Khrustalev et al. 2015). Since APP gene is quite conserved, we aligned the data on nucleotide usage biases for 129 sequences of placental mammals, for 4 sequences of marsupial mammals, and for 13 sequences of birds, and analyzed them. We calculated standard deviations for usages of each nucleotide in fourfold degenerated sites, and then calculated their average value to reflect the degree of variability of nucleotide usage bias in a window 150 codons in length. Also, we calculated percentage of sequences with a bias that is not C4f in each site of the alignment of sequences from placental mammals, and percentage of sequences with a bias that is not T4f in each site of alignments of sequences from marsupial mammals and birds.
We used complete set of germline mutations in APP gene of Human from the Ensemble data base (Howe et al. 2021), and complete set of somatic mutations in the same gene from the COSMIC data base (Alsulami et al. 2021). To calculate the rates of nucleotide mutations in each intron we divided the number of sites with each type of single nucleotide mutation of a given direction (i.e. with G to A transition) by the number of sites with the initial nucleotide in the consensus sequence (i.e. with G). In the coding region we distinguished synonymous mutations, missense mutations, and nonsense mutations. So, we calculated numbers of sites in which a given nucleotide mutation (i.e. C to T) is synonymous, missense, and nonsense in the consensus sequence. Then we divided observed numbers of sites with synonymous, missense, and nonsense mutations by the respective numbers of available sites for them. Rates of mutations have been compared with each other by the t-test for relative values. The information on predicted genes coding for lncRNAs and snoRNAs was obtained from the description of corresponding regions of chromosomes from the Ensemble data base, as well as the information on boarders of exons and introns that are conserved in the studied lineage (Howe et al. 2021).

Nucleotide usage biases in APP coding region from Homo sapiens, Vombatus ursinus, and Taeniopygia guttata
Nucleotide usage in fourfold degenerated sites along the length of the Human APP gene coding region is shown in Fig. 2a. As one can see, the usage of cytosine (C4f) is dominating along the most of the length of this coding region. However, there are two noticeable short fragments where A4f becomes higher than C4f: from codon 182 to codon 197 (exon 5), and in codons 248 and 249 (exon 6). Since we use a window of 150 codons in length for this kind of calculation, we cannot ignore even those short fragments with the reversed bias. The usage of T4f becomes the highest one in a short fragment from codon 424 to codon 435 (exon 10), as well as in the long 3′-terminal part of the coding region: starting from the codon 603 until the last codon 771 (exons 14-18). Interestingly, in this region there are also two short islands with A4f values that are higher than T4f: in codons 646-652 and 660-663 (exons 14-16). This kind of picture of biases distribution should be compared with those for the same gene from other animals to exclude the influence of occasional events.
In the coding region of APP gene from wombat (that is a representative of marsupials) the usage of T4f is higher than usage of other nucleotides throughout the most of its length (Fig. 2b). Only in the 3′-terminal part of the coding region (starting from codon 608) A4f becomes higher than T4f. Obviously, mutational bias was different for APP gene in the lineage leading to Human (C-pressure) than in the linage leading to wombat (T-pressure). However, in both of these lineages the 3′-part of that gene demonstrates its own mutational bias: T-pressure in Human lineage and A-pressure in wombat lineage.
In the coding region of APP gene from Taeniopygia guttata (a bird known as Zebra finch) the distribution of nucleotide usage biases is quite mosaic (Fig. 2c). T4f demonstrates the highest usage in the most of the fragments, while C4f has the highest level in the region from codon 281 to codon 376, and A4f is higher than usages of three other nucleotides in three regions: codons 160-166; codons 226-272; codons 656-771. Even in the gene of a bird the 3′-part of the coding region shows its own nucleotide usage bias. To check is it the case for the most of the sequences, we studied those biases in the set of 129 sequences of APP gene from different species of placental mammals.

Analysis of nucleotide usage biases in numerous APP coding regions from placentals, marsupials, and birds
In Fig. 3a we show several types of data. One of the graphs from Fig. 3a shows the degree of variation of the bias in fourfold degenerated sites among sequences from 129 species of placental mammals. An average standard deviation (multiplied by 10) for A4f, T4f, C4f, and G4f has two highest peaks: at codon 194 (in the exon 5) and at codon 635 (in the exon 14). It means that in those fragments of the APP coding region nucleotide usage biases are quite different among studied species. In contrast, in the fragment from codon 300 to codon 350 (in the exon 7) variations in nucleotide usage in fourfold degenerated sites show the lowest value.
There are also three graphs in Fig. 3a that show the percent of sequences in which A4f, T4f, and G4f demonstrate the highest usage in a window of 150 codons. Thanks to this representation one can see that in the exon 5 there may be any bias, depending on the species, while the order is like this: C4f > A4f > T4f > G4f. In the exon 6 the bias is also quite variable, and the order of the abundance of that bias is: C4f > A4f > G4f > T4f.
Starting from the exon 10, the T4f value shows several peaks, and starting from the exon 11 the G4f also shows several peaks. However, the usage of C4f prevails in less than 50% of sequences only in two regions: centered at exons 14 and 16. As one can notice in Fig. 3a, starting from the exon 14 all four types of bias are becoming possible. Coming back to Fig. 2a, this can be explained by the fact that usages of all four nucleotides are quite close to each other in the 3′-part of the coding region. Taken together, C4f usage in APP coding region in placental mammals may be decreased in exons 5 and 6, as well as in 3′-terminal ones starting from the exon 14.
In Fig. 3b we placed an information on APP coding regions from four species of marsupials. In all of them the usage of T4f prevails, while in the 3′-part (after codon 610) it changes to A4f. Also, there are several regions in which T4f can be changed to A4f in some species, and a few regions in which C4f may prevail (Fig. 3b).
In APP coding regions of birds (Fig. 3c) there are fragments in which each of the four nucleotides prevails: G4f prevails in codons 97-120; C4f prevails in codons 275-380; T4f prevails in codons 380-660; while A4f becomes a predominant one in codons 160-270, as well as in codons 661-771 (in the 3′-terminal part of the coding region). Interestingly, standard deviation of nucleotide usage bias is 1 3 Fig. 2 Nucleotide usage biases in fourfold degenerated sites along the length of APP coding region A from Human; B Wombat; and C Zebra finch. The length of a sliding window is equal to 150 codons, the step is equal to 1 codon 1 3 Fig. 3 Deviations from the prevailing nucleotide usage bias for sequences from placental animals (A), marsupial animals (B), and birds (C). "Non-C4f" shows the fraction of sequences with nucleotide usage bias that is different from C4f bias in a window 150 codons in length. "non-T4f" shows the fraction of sequences with nucleotide usage bias that is different from T4f bias in a window 150 codons in length. "A4f", "T4f", "C4f", and "G4f" show fractions of sequences with corresponding nucleotide usage biases in a window 150 codons in length. "Sigma·10" shows an average standard deviation for nucleotide usage biases between all the sequences from the alignment in a given window 150 codons in length multiplied by 10. Boarders of exons are provided according to the description of Human APP gene almost the same through the whole length of APP coding region from different species of birds (Fig. 3c), unlike in sequences from different species of mammals (Fig. 3a, b).
Analysis of biases has led us to the hypothesis of the existence of several autonomously transcribing parts of the APP gene. To check this hypothesis, we analyzed rates of nucleotide mutations in Human APP gene.

Directions of germline and somatic mutations in the coding region of human APP gene
In Table 1 the rates of different nucleotide germline mutations in Human APP coding region are compared with each other. Both synonymous and missense C to T transitions are significantly more frequent than synonymous and missense T to C transitions. The rate of synonymous C to T transitions is significantly higher than the rate of missense C to T transitions, and that is an indicator of negative selection acting on those mutations (Kumar and Patel 2018). In contrast, only missense G to A transitions occur at a significantly higher rate than missense A to G transitions. The difference in rates of synonymous G to A and A to G transitions is not significant. Unexpectedly, the rate of synonymous G to A transitions is significantly lower than the rate of missense G to A transitions, that is usually interpreted as the evidence of positive selection of such nucleotide mutations (Kumar and Patel 2018).
In Table 2 we compare the rates of different types of somatic mutations in APP gene found in cancer cells. During somatic mutagenesis the rates of both synonymous and missense C to T transitions are again significantly higher than the rates of synonymous and missense T to C transitions. But this time the rate of synonymous C to T transitions is almost the same (the difference is not significant) as the rate of missense C to T transitions. It means that natural selection is not working on C to T transitions in APP gene from cancer cells (Kumar and Patel 2018). Indeed, that gene is not directly involved in oncogenesis or immune escape. From this point of view, it is strange to notice that the rate of missense G to A transitions is (as during germline mutagenesis) significantly higher than the rate of synonymous G to A transitions (Kumar and Patel 2018). The rate of missense G to A transitions is significantly higher than the rate of missense A to G transitions.
The source of missense G to A mutations have been found by us in the region of APP gene that is coding for  intrinsically disordered part of the protein (codons 194-283). That region is enriched by Glu residues (coded by GAG and GAA codons) and Asp residues (coded by GAC and GAT codons) (Gadhave et al. 2020). C-terminal part of the APP is also enriched by Glu residues. So, G to A mutations in first positions of those codons are leading to Glu to Lys substitutions (26 out of 136 missense G to A germline mutations) and Asp to Asn substitutions (15 out of 136 missense G to A mutations). If the latter mutation is not so radical one, as well as Val to Ile (18 out of 136 missense G to A mutations), Val to Met (15 out of 136 missense G to A mutations), and even Ala to Thr (18 out of 136 missense G to A mutations) ones, the most frequent Glu to Lys substitution is quite radical (i.e. negatively charged residue is substituted by positively charged one). Among all of the studied sequences of APP protein from different species, residues of Glu also mutate frequently, while they are substituted mostly by Asp (in 22 out of 35 such sites), and not by Lys (in 3 out of 35 sites). Less radical substitutions are frequent both during the germline mutagenesis and during the evolution: Ala to Thr (16 out 40 sites); Val to Ile (12 out of 39 sites). However, Asp to Asn substitutions (2 out of 25 sites) are not so frequent in the course of evolution, as Asp to Glu substitutions (11 out of 25 sites). These data show that negatively charged residues of Glu and Asp are conserved in the disordered part of the APP, as well as in other fragments, and they can easily replace each other. It means that missense germline mutations of G to A leading to Glu to Lys and Asp to Asn substitutions cannot be beneficial for the development of the offspring. It means that there is no positive selection for them during germline mutagenesis, but the probability of their occurrence is increased in certain locations of the coding region due to the presence of processed repeats coding for the intrinsically disordered part of the protein.

Directions of germline mutations in introns of human APP gene
The number of known mutations in introns is much higher than the number of substitutions in exons, since the length of an intron is usually much longer than the length of an exon. That is why we decided to calculate the rates of nucleotide mutations in each intron of the Human APP gene. The overall direction of nucleotide mutations (the prevalence of C to T and G to A transitions) is the same in introns and exons of the APP gene. However, there are some exceptional introns (Table 3). In the intron 10-11 there is no significant difference between the rates of C to T and T to C transitions, as well as between G to A and A to G transitions. One of the reasons of this may be in the short length of this intron (725 nucleotides). Other exceptional introns are longer: in the intron 9-10 and the intron 12-13 the rates of G to A transitions and A to G transitions are equal to each other.
Interestingly, the rates of C to T and G to A transitions are the same in the most of the introns, except introns 1-2 and 14-15. In two abovementioned introns the rate of C to T transitions is significantly higher than the rate G to A transitions.
The rate of A to G transitions is significantly higher than the rate of T to C transitions in the most of introns, except introns 14-15, 16-17, and 17-18. In those three 3'-terminal introns the rate of T to C transitions is the same as the rate of A to G transitions.
Some intriguing data can be obtained from the rates of G to T transversions as well. The rate of G to T transversions is significantly higher than the rate of T to G transversions in the most of introns, except introns 4-5, 7-8, 10-11, 14-15, 15-16, 16-17 (Table 3). The rate of T to G transversions is significantly higher in introns 15-16 and 16-17 than in each of other introns, except introns 7-8 and 10-11.
It is important to highlight that introns with deviations in nucleotide mutations rates are situated near exons with deviations in nucleotide usage biases. There is a need to discuss these data in light of the existence of self-transcribing elements in APP gene of Human and domestic mouse with the aim to check for any associations between the presence of such elements and changes in nucleotide usage and rates of nucleotide mutations.

Possible connection between autonomous expression of lncRNA and deviations in mutational pressure in introns and exons of APP gene
In the APP gene of Human there are several lncRNAs (long noncoding RNAs). Four of those lncRNAs are situated near the exon 1. Other seven lncRNAs have been found in the intron 13-14, as well as a snRNA (small nuclear RNA). The latter one has also been mapped in APP gene of other species. In the mouse APP gene lncRNA and snoRNA (small nucleolar RNA) are mapped in the intron 1-2. Another lncRNA from the mouse APP gene is situated on the complementary strand in front of exons 4 and 5, and in front of intron 4-5. One more predicted lncRNA occupies the fragment of the complementary strand in front of the exon 14 and introns 13-14 and 14-15. In the intron 11-12 there is also a processed pseudogene. Several lncRNAs connected with the pathogenesis of Alzheimer disease have been described, but they are situated in other regions of the genome (Ma et al. 2020;Idda et al. 2018).
One may try to connect the existence of lncRNAs in the intron 1-2 with the increased rates of C to T transitions. One can also link the existence of lncRNA in the complementary strand of exons 4 and 5 with the decrease of G to A transitions rate, as well as with high variability of nucleotide usage biases in the exon 5. Another connection is there between the existence of lncRNA on the complementary strand to the exon 14, high variability of the bias in that exon, and the increased rate of T to C transitions in the intron 14-15.
The existence of lncRNAs in intron 13-14 in Human APP gene is not connected with any deviation in rates of nucleotide mutations, while both exon 13 and exon 14 demonstrate T4f-bias in many sequences of placental mammals. Exons 6 and 10 are not surrounded by predicted autonomously transcribing elements, while introns 9-10 and 10-11 demonstrate identical rates of G to A and A to G transitions. Indeed, if a lncRNA is expressed during embryogenesis, mutations that occur during that period should result in the nucleotide usage bias that is inherited. If a lncRNA is expressed only in differentiated cells, its expression cannot lead to any inherited nucleotide usage bias. From this point of view, one may try to find out whether a given lncRNA is expressed during embryogenesis or gametogenesis using our simple approaches. However, the direction of mutagenesis may occasionally be the same for the whole gene and for the lncRNA.

Fundamental questions raised by the current study
Facts revealed in this study need to be discussed from the point of view of importance for molecular evolution and current knowledge on Alzheimer disease pathogenesis. Here we proved that the rates of different types of so-called germline nucleotide mutations are different in some specific introns of the same Human gene. The overall direction of nucleotide mutations is the same in exons and introns, but it is opposite to the observed bias in nucleotide usage in fourfold degenerated sites. And even that bias is not uniform through the whole length of the same coding region. Intriguingly, in APP gene of marsupial mammals there is T-bias along the most of the length, and A-bias in the 3′-terminal part of the sequence, while in placental mammals there is C-bias along the most of the length, and its 3′-terminal part demonstrates a different bias (either T, or G, or A-bias) in more than 65% of studied species. These facts taken together can be explained by the frequently changing mutational pressure (both during the evolution and during different steps of embryogenesis and gametogenesis) that may be local due to the autonomous transcription, rather than by species-, exon-, and intron-specific positive natural selection for different nucleotide usage biases.
Results of the current study raise a serious question about the origin of so-called germline mutations. Since numerous Glu to Lys substitutions found among consequences of germline mutations are not fixed during evolution, they should not be beneficial and even neutral. So, it seems like these mutations are frequent not in real germ cells, but in stem cells from buccal mucosa or in blood stem cells, in which APP doesn't play such significant role as in neurons. So, mutations found in stem cells might occur during the embryogenesis after the divergence of future germ cells and future stem cells. Actually, to test this hypothesis one needs to check mutations in APP gene of the offspring and/or parents of a sequenced individual. Interestingly, the direction of nucleotide usage bias (C4f-bias) is opposite to the preferable direction of germline nucleotide mutations (GC to AT) not only in the APP gene, but in a gene coding for human epidermal growth factor receptor (EGFR) as well (Khrustalev et al. 2019). However, in the last one the rates of germline missense mutations are not significantly higher than the rates of synonymous mutations (Khrustalev et al. 2019).
The elevated rate of missense G to A mutations can be explained by the increased usage of G in first codon positions in some fragments of the coding region. Moreover, the disordered part of the protein is encoded by exons 5 and 6. These exons demonstrate quite variable nucleotide usage bias that is different among animals. The rates of mutations in the intron 4-5 are specific. In mouse APP gene there is a lncRNA on the complementary strand that is quite close to the exon 5. So, the increase in G to A missense mutations rate may be caused, at least partially, by C to T transitions that occur during the expression of that lncRNA on the nontranscribed strand of DNA.
There are several known transcript variants of the APP gene. Some of those variants lack exons 16 and 17, while others are relatively short mRNAs containing the sequence encoding beta-amyloid peptides. The protein-coding APP-207 transcript discovered in fibroblasts of a patient suffering from Lesch-Nyhan syndrome (Nguyen 2014) containing exons 1-2, exon 3 (partially), and exons 16-18 may be an alternative source of beta-amyloid peptides in those cells. Indeed, epigenetic regulation of APP expression and alternative splicing may be responsible of its involvement in pathogenesis of Lesch-Nyhan and Down syndromes, as well as in other diseases (Nguyen 2014). Certain mutations in genes like hypoxanthine-guanine phosphoribosyltransferase may indirectly cause upregulation or downregulation of the expression of the whole APP gene, changes in the proportion of splicing variants of its mRNA (Nguyen 2014), and, probably, also expression of its autonomously transcribed elements. However, alternative splicing of an RNA transcript cannot influence biases in nucleotide usage in a gene, as well as in directions of both germline and somatic mutations in specific regions of that gene. Changes in expression of the whole gene or its autonomously transcribed parts should be reflected by deviations in somatic mutation rates and directions.
The whole APP gene is quite conserved. Sequences of lncRNA are also known to be conserved (Ma et al. 2020). That is why we may suspect that lncRNAs mapped in mouse APP gene are working in many other species as well. Those lncR-NAs can be expressed or not expressed during the embryogenesis. Also, mutational pressure may be different at those stages of embryogenesis in different species. That is why expression of lncRNAs may or may not cause nucleotide usage biases in coding regions and in introns.

Possible connection between deviations in mutational pressure in introns and exons from the 3′-end of APP gene and Alzheimer disease pathogenesis
From the data obtained in this study one can understand that there are some additional functions of the sequence that encodes beta-amyloid peptides. That sequence is situated in exons 16 and 17. Nucleotide usage biases are special in those exons both in placentals and marsupials, as well as in birds. The rates of T to C transitions are increased in introns situated between those exons in Human APP gene. However, there are no autonomously expressed sequences predicted in that 3′-end of the APP gene yet. So, we may suspect that there are some lncRNAs expressed from that part of a gene, or that the 3′-end itself may be autonomously expressed at some stages of embryogenesis or even during gametogenesis (Silva et al. 2015).
It is likely that in some cases of Alzheimer disease, the autonomous expression of beta-amyloid peptides may take place, just like during early steps of embryogenesis. According to Volloch et al. (2020), the failure of all possible attempts to block the amyloidogenic pathway of APP processing is an evidence of the existence of an alternative way of beta-amyloid peptides production. Those authors suggest that RNA-dependent amplification of APP mRNA is the alternative (pathological) pathway of beta-amyloid peptides production (Volloch et al. 2020). In our opinion, expression of short mRNAs possessing exons 16 and 17 (Nguyen 2014), as well as autonomous transcription of mRNA that contains only exons 16, 17, and 18 of the APP gene may be even more probable mechanisms of betaamyloid peptides accumulation. We provided two facts in support of our hypothesis of autonomous transcription: nucleotide usage bias is different in the 3′-terminal part of APP coding region than in the rest of the sequence for many mammals and birds; the rates of nucleotide mutations are different for introns surrounding three last exons of that gene and for the rest of its introns.