Evolutionary patterns of origin, expansion, conservation, and divergence of pan-primate endogenous retroviruses encoding human embryo regulatory LTR elements.
Evolutionary timelines of transitions of retroviruses from the state of exogenous infection agents to endogenous retroviral sequences integrated into host genomes could be inferred from the comparative quantitative analyses of highly conserved orthologous retrovirus-derived Long Terminal Repeat (LTR) loci in genomes of multiple distinct primate species which diverged from the last extinct common ancestor (ECA) at distinct time points during primates’ evolution (Glinsky, 2022). To accomplish this task, a total of 8839 fixed non-polymorphic sequences of human embryo regulatory LTRs residing in genomes of Modern Humans (hg38 human reference genome database) were retrieved and the number of highly conserved orthologous loci in genomes of sixteen non-human primates (NHP) were determined (Fig. 1). Present analyses were focused on human embryo regulatory LTR loci (see Introduction) the evolutionary origins of which could be mapped to genomes of Old World Monkeys (OWM), indicating that these genomic loci could be defined as highly-conserved pan-primate regulatory sequences because they have been preserved in primates’ genomes for ~ 30 MYA. Three distinct LTR families meet these criteria, namely MLT2A1 (2416 loci), MLT2A2 (3069 loci), and LTR7 (3354 loci). It has been observed that MLT2A1, MLTA2, and LTR7 appear to have similar evolutionary histories defined by evolutionary patterns of expansion, conservation, and divergence during primate evolution (Fig. 1A). However, they have clearly distinguishable histories of timelines of evolutionary origins, which are defined herein as the estimated timelines when retroviruses infect primates’ germlines and transition from the state of exogenous infection agents to endogenous retroviral sequences integrated into host genomes (Fig. 1A). The consistent earliest presence of highly-conserved pan-primate MLT2A1 and MLT2A2 loci could be mapped to genomes of both New World Monkeys (NWM) and OWM, indicating that both MLT2A1/HERVL and MLT2A2/HERVL retroviruses have infected primates’ germlines before the segregation of OWM and NWM lineages (Fig. 1A). In contrast, the consistent earliest presence in primates’ genomes of LTR7 loci could be mapped to genomes of OWM (Fig. 1A), suggesting that LTR7/HERVH retroviruses have entered germlines of the primate lineage after the separation of the NWM lineage (Glinsky, 2022). Pan-primate endogenous MLT2A1/HERVL, MLT2A2/HERVL, and LTR7/HERVH retroviruses appear to achieve a significant expansion in genomes of Great Apes, ranging from ~ 2-fold expansion for MLT2A2 loci to ~ 5-fold expansion for LTR7 loci (Fig. 1A). In contrast to highly-conserved pan-primate human embryo regulatory LTRs, the consistent earliest presence of highly-conserved LTR5_Hs loci has been observed in the Gibbon’s genome (Glinsky, 2022), in accord with the hypothesis that LTR5_Hs/HERVK retroviruses successfully colonized germlines of the primate lineage after the segregation of Gibbons’ species and subsequently underwent a marked expansion in genomes of Great Apes.
In agreement with the previously reported findings for highly-conserved LTR7 and LTR5_Hs loci (Glinsky, 2022), timeline sequences of quantitative expansion of human embryo pan-primate regulatory LTRs in different primate species appear to replicate the consensus evolutionary sequence of increasing cognitive and behavioral complexities during primate evolution, which seems particularly striking for LTR7 loci (Fig. 1A).
One of notable findings of LTR loci sequence conservation analyses is the observation that hundreds of highly-conserved pan-primate human embryo regulatory LTRs have DNA sequences divergent by more than 5% from orthologous sequences in genomes of Modern Humans in genomes of our closest evolutionary relatives, Chimpanzee and Bonobo. It was of interest to determine how many of fixed non-polymorphic human embryo regulatory LTR loci could be defined as candidate human-specific (HS) genomic regulatory elements compared to both Chimpanzee and Bonobo genomes. Setting a selection threshold at < 10% of sequence identity, there are 45; 35; and 175 sequences that could be classified as human-specific MLT2A2, MLT2A1, and LTR7 loci, respectively (Fig. 1B). A majority of candidate HS regulatory LTR loci of human embryo could be classified as bona fide human-specific insertions because they did not intersect any chains in genomes of both Chimpanzee and Bonobo.
Results of previously reported evolutionary conservation analyses indicated that nearly half (84/175; 48%) of candidate HS-LTR7 loci have been identified as highly-conserved orthologous sequences in genomes of OWM, Gibbon, Orangutan, and Gorilla (Glinsky, 2022), consistent with the hypothesis that these HS-LTR7 loci were not retained in genomes of Chimpanzee and Bonobo but preserved in genomes of Modern Humans. Similar to HS-LTR7 loci, the patterns of failure to retain human-specific DNA sequences have been observed in genomes of both Chimpanzee and Bonobo for candidate HS regulatory LTR elements encoded by MLT2A1 and MLT2A2 loci (Fig. 1B). Collectively, these observations strongly argue that species-specific loss of highly-conserved pan-primate regulatory DNA sequences encoded by regulatory LTRs may have contributed to the emergence during primate evolution of human-specific features of genomic regulatory networks (GRN).
Identification of regulatory LTR’s down-stream target genes and analysis of potential phenotypic impacts of human embryo regulatory LTR elements on pathophysiology of Modern Humans.
One of the main mechanisms defining biological functions of regulatory LTRs is their activity as distal genomic regulatory elements (GRE) affecting transcription of down-stream target genes. Therefore, the inference of potential biological functions of human embryo regulatory LTR elements could be made based on evaluations of experimentally documented biological activities of their down-stream target genes. To this end, the Genomic Regions Enrichment of Annotations Tool (GREAT) algorithm was employed (Methods) to identify putative down-stream target genes of human embryo regulatory LTRs. Concurrently with the identification of putative regulatory target genes of GREs, the GREAT algorithm performs stringent statistical enrichment analyses of functional annotations of identified down-stream target genes, thus enabling the inference of potential biological significance of interrogated GRNs. Importantly, the assignment of phenotypic traits as putative statistically valid components of GRN actions entails the assessments of statistical significance of the enrichment of both GREs and down-stream target genes.
The GREAT algorithm identified 5444 genes as potential down-stream regulatory targets of 8839 human embryo regulatory LTR elements (Tables 1–3; Supplementary Table S1). It has been observed that distinct families of human embryo regulatory LTR elements share overlapping sets of down-stream target genes, including 709 genes that were identified as the consensus set of candidate down-stream regulatory targets for MLT2A1; MLT2A2; and LTR7 elements (Table 1; Supplemental Table S1). At the statistical significance threshold of the binominal FDR Q value < 0.002 (Table 2), a total of 54 significantly enriched records of biological processes appear preferentially affected by the regulatory actions of 8839 human embryo regulatory LTRs (Table 2; Supplementary Table S1). Cell-cell adhesion via plasma-membrane adhesion molecules (GO:0098742; 677 LTRs linked with 95 genes; binominal FDR Q value = 2.65E-15), cell-cell adhesion (GO:0098609; 1089 LTRs linked with 238 genes; binominal FDR Q value = 3.18E-05), and homophilic cell adhesion (GO:0007156; 481 LTRs linked with 62 genes; binominal FDR Q value = 1,15E-12) via plasma membrane adhesion molecules have been identified among top-ranked biological processes associated with regulatory effects of human embryo regulatory LTR elements (Table 2; Supplementary Table S1).
Table 1
Numbers of human embryo regulatory LTR loci and their putative down-stream regulatory target genes in human genome.
Human embryo regulatory LTR families
|
Number of loci in human genome
|
Number of putative regulatory target genes
|
MLT2A1
|
2416
|
2450
|
MLT2A2
|
3069
|
2701
|
LTR7
|
3354
|
2955
|
MLT2A1 & MLTA2 (all loci)
|
5485
|
3971
|
LTR7 & MLT2A1 & MLT2A2 (all loci)
|
8839
|
5444
|
Human embryo regulatory LTR families
|
Number of loci in human genome
|
Number of common regulatory target genes
|
MLT2A1 & MLTA2 loci common regulatory target genes
|
5485
|
1180
|
MLTA1 & LTR7 loci common regulatory target genes
|
5770
|
1044
|
MLTA2 & LTR7 loci common regulatory target genes
|
6423
|
1149
|
LTR7 & MLT2A1 & MLT2A2 loci common regulatory target genes
|
8839
|
709
|
Table 2
Top-ranked biological processes preferentially affected by 8,839 human embryo regulatory LTRs identified by proximity placement-guided Genomic Regulatory Network (GRN) analysis* of Gene Ontology (GO) Biological Process database.
GO ID
|
Description
|
Binominal Rank
|
Binominal P value
|
Binominal Bonferroni P value
|
Binominal FDR Q value
|
LTR Fold Enrichment
|
Observed LTRs
|
Gene Fold Enrichment
|
Observed Genes
|
GO:0098742
|
cell-cell adhesion via plasma-membrane adhesion molecules
|
1
|
2.05E-19
|
2.65E-15
|
2.65E-15
|
1.42
|
677
|
1.46
|
95
|
GO:0007156
|
homophilic cell adhesion via plasma membrane adhesion molecules
|
2
|
1.79E-16
|
2.31E-12
|
1.15E-12
|
1.47
|
481
|
1.38
|
62
|
GO:0051965
|
positive regulation of synapse assembly
|
3
|
1.12E-12
|
1.45E-08
|
4.83E-09
|
1.48
|
355
|
2.12
|
40
|
GO:0006408
|
snRNA export from nucleus
|
4
|
1.03E-10
|
1.33E-06
|
3.33E-07
|
14.46
|
12
|
1.72
|
1
|
GO:0051963
|
regulation of synapse assembly
|
5
|
4.35E-10
|
5.62E-06
|
1.12E-06
|
1.38
|
392
|
2.05
|
51
|
GO:0051030
|
snRNA transport
|
6
|
8.87E-10
|
1.15E-05
|
1.91E-06
|
11.92
|
12
|
1.15
|
1
|
GO:0044691
|
tooth eruption
|
7
|
3.88E-09
|
5.02E-05
|
7.17E-06
|
3.74
|
29
|
2.30
|
2
|
GO:0050803
|
regulation of synapse structure or activity
|
8
|
1.21E-08
|
1.56E-04
|
1.95E-05
|
1.31
|
448
|
1.80
|
68
|
GO:0098609
|
cell-cell adhesion
|
9
|
2.21E-08
|
2.86E-04
|
3.18E-05
|
1.17
|
1089
|
1.29
|
238
|
GO:0050807
|
regulation of synapse organization
|
10
|
2.87E-08
|
3.71E-04
|
3.71E-05
|
1.30
|
440
|
1.82
|
66
|
GO:0098904
|
regulation of AV node cell action potential
|
11
|
8.99E-08
|
1.16E-03
|
1.06E-04
|
3.22
|
29
|
3.45
|
3
|
GO:0086072
|
AV node cell-bundle of His cell adhesion involved in cell communication
|
12
|
1.47E-07
|
1.90E-03
|
1.58E-04
|
3.89
|
22
|
3.45
|
1
|
GO:0007210
|
serotonin receptor signaling pathway
|
13
|
1.60E-07
|
2.07E-03
|
1.59E-04
|
2.21
|
54
|
3.45
|
7
|
GO:1904292
|
regulation of ERAD pathway
|
14
|
1.78E-07
|
2.30E-03
|
1.64E-04
|
2.13
|
58
|
0.89
|
7
|
GO:0046068
|
cGMP metabolic process
|
15
|
3.43E-07
|
4.43E-03
|
2.95E-04
|
1.78
|
90
|
1.56
|
14
|
GO:0051096
|
positive regulation of helicase activity
|
16
|
3.66E-07
|
4.72E-03
|
2.95E-04
|
3.16
|
27
|
1.15
|
2
|
GO:0007416
|
synapse assembly
|
17
|
3.77E-07
|
4.86E-03
|
2.86E-04
|
1.36
|
285
|
1.57
|
35
|
GO:0000710
|
meiotic mismatch repair
|
18
|
3.77E-07
|
4.87E-03
|
2.70E-04
|
8.73
|
10
|
1.72
|
1
|
GO:0034497
|
protein localization to pre-autophagosomal structure
|
19
|
4.80E-07
|
6.20E-03
|
3.27E-04
|
2.56
|
37
|
1.15
|
4
|
GO:0072378
|
blood coagulation, fibrin clot formation
|
20
|
4.90E-07
|
6.34E-03
|
3.17E-04
|
2.29
|
46
|
1.46
|
11
|
GO:0097104
|
postsynaptic membrane assembly
|
21
|
5.18E-07
|
6.69E-03
|
3.19E-04
|
1.79
|
86
|
2.07
|
6
|
GO:0099560
|
synaptic membrane adhesion
|
22
|
5.29E-07
|
6.83E-03
|
3.10E-04
|
2.05
|
59
|
3.45
|
3
|
GO:0006182
|
cGMP biosynthetic process
|
23
|
5.44E-07
|
7.03E-03
|
3.06E-04
|
1.99
|
63
|
1.29
|
9
|
GO:0007215
|
glutamate receptor signaling pathway
|
24
|
7.16E-07
|
9.25E-03
|
3.85E-04
|
1.48
|
173
|
2.12
|
27
|
GO:0098792
|
xenophagy
|
25
|
7.41E-07
|
9.57E-03
|
3.83E-04
|
5.79
|
13
|
0.86
|
1
|
GO:0046629
|
gamma-delta T cell activation
|
26
|
7.53E-07
|
9.72E-03
|
3.74E-04
|
3.30
|
24
|
1.15
|
2
|
GO:1904293
|
negative regulation of ERAD pathway
|
27
|
8.32E-07
|
1.07E-02
|
3.98E-04
|
2.43
|
39
|
0.80
|
3
|
GO:0052652
|
cyclic purine nucleotide metabolic process
|
28
|
1.19E-06
|
1.53E-02
|
5.47E-04
|
1.89
|
68
|
1.08
|
10
|
GO:1900371
|
regulation of purine nucleotide biosynthetic process
|
29
|
1.19E-06
|
1.54E-02
|
5.30E-04
|
1.36
|
260
|
1.47
|
67
|
GO:0006538
|
glutamate catabolic process
|
30
|
1.22E-06
|
1.58E-02
|
5.26E-04
|
2.28
|
43
|
1.72
|
4
|
GO:0009190
|
cyclic nucleotide biosynthetic process
|
31
|
1.41E-06
|
1.82E-02
|
5.86E-04
|
1.88
|
68
|
1.05
|
10
|
GO:0051962
|
positive regulation of nervous system development
|
32
|
1.47E-06
|
1.90E-02
|
5.95E-04
|
1.16
|
983
|
1.50
|
204
|
GO:0030808
|
regulation of nucleotide biosynthetic process
|
33
|
1.48E-06
|
1.91E-02
|
5.78E-04
|
1.35
|
260
|
1.46
|
67
|
GO:0097105
|
presynaptic membrane assembly
|
34
|
1.95E-06
|
2.52E-02
|
7.42E-04
|
1.79
|
77
|
1.92
|
5
|
GO:0051095
|
regulation of helicase activity
|
35
|
2.02E-06
|
2.61E-02
|
7.47E-04
|
2.88
|
27
|
0.77
|
2
|
GO:0034332
|
adherens junction organization
|
36
|
2.45E-06
|
3.17E-02
|
8.81E-04
|
1.39
|
216
|
1.39
|
29
|
GO:0007158
|
neuron cell-cell adhesion
|
37
|
2.60E-06
|
3.36E-02
|
9.09E-04
|
1.61
|
106
|
2.30
|
10
|
GO:0045409
|
negative regulation of interleukin-6 biosynthetic process
|
38
|
2.87E-06
|
3.70E-02
|
9.75E-04
|
4.43
|
15
|
2.07
|
3
|
GO:0010838
|
positive regulation of keratinocyte proliferation
|
39
|
3.44E-06
|
4.44E-02
|
1.14E-03
|
2.11
|
47
|
2.30
|
6
|
GO:0097090
|
presynaptic membrane organization
|
40
|
3.60E-06
|
4.65E-02
|
1.16E-03
|
1.76
|
77
|
1.72
|
5
|
GO:0007157
|
heterophilic cell-cell adhesion via plasma membrane cell adhesion molecules
|
41
|
4.17E-06
|
5.38E-02
|
1.31E-03
|
1.38
|
208
|
1.93
|
28
|
GO:0061739
|
protein lipidation involved in autophagosome assembly
|
42
|
4.22E-06
|
5.45E-02
|
1.30E-03
|
4.58
|
14
|
1.72
|
2
|
GO:0070295
|
renal water absorption
|
43
|
4.45E-06
|
5.75E-02
|
1.34E-03
|
3.25
|
21
|
0.86
|
1
|
GO:0045656
|
negative regulation of monocyte differentiation
|
44
|
4.62E-06
|
5.97E-02
|
1.36E-03
|
2.64
|
29
|
2.46
|
5
|
GO:0009065
|
glutamine family amino acid catabolic process
|
45
|
5.01E-06
|
6.47E-02
|
1.44E-03
|
1.92
|
57
|
1.38
|
10
|
GO:0045659
|
negative regulation of neutrophil differentiation
|
46
|
5.89E-06
|
7.61E-02
|
1.65E-03
|
3.57
|
18
|
3.45
|
2
|
GO:0003097
|
renal water transport
|
47
|
6.18E-06
|
7.99E-02
|
1.70E-03
|
3.18
|
21
|
0.69
|
1
|
GO:0016074
|
snoRNA metabolic process
|
48
|
6.30E-06
|
8.14E-02
|
1.69E-03
|
2.98
|
23
|
1.72
|
7
|
GO:1902567
|
negative regulation of eosinophil activation
|
49
|
6.38E-06
|
8.25E-02
|
1.68E-03
|
10.53
|
7
|
3.45
|
2
|
GO:0045658
|
regulation of neutrophil differentiation
|
50
|
6.72E-06
|
8.69E-02
|
1.74E-03
|
3.54
|
18
|
2.30
|
2
|
GO:0006536
|
glutamate metabolic process
|
51
|
7.24E-06
|
9.35E-02
|
1.83E-03
|
1.83
|
63
|
1.19
|
10
|
GO:0030853
|
negative regulation of granulocyte differentiation
|
52
|
7.30E-06
|
9.43E-02
|
1.81E-03
|
3.14
|
21
|
1.29
|
3
|
GO:0071362
|
cellular response to ether
|
53
|
7.34E-06
|
9.48E-02
|
1.79E-03
|
2.80
|
25
|
2.07
|
3
|
GO:1904637
|
cellular response to ionomycin
|
54
|
7.82E-06
|
1.01E-01
|
1.87E-03
|
2.94
|
23
|
1.72
|
2
|
Legend: *Proximity-placement-guided Genomic Regulatory Network (GRN) analyses were performed employing the GREAT algorithm software as described in Methods. |
Table 3
Quantitative profiles of regulatory LTR elements, down-stream target genes, and genomic regulatory modules (GRMs) * within 26 gene ontology enrichment analysis-defined genomic regulatory networks (GRNs) ** of the genomic dominion of human embryo regulatory LTRs.
Gene ontology ID
|
Genomic regulatory networks
|
Number of LTRs
|
Number of genes
|
GO Biological Process
|
GO Cellular Component
|
GO Molecular Function
|
Human Phenotype
|
Mouse Phenotype Single KO
|
Mouse Phenotype
|
Number of GRNs & GRMs
|
NA
|
Genomic dominion of human embryo regulatory LTRs
|
8839
|
5444
|
38
|
20
|
15
|
7
|
6
|
9
|
95
|
GO:0051960
|
Regulation of nervous system development
|
1424
|
716
|
1676
|
110
|
132
|
261
|
1028
|
1413
|
4620
|
GO:0051962
|
Positive regulation of nervous system development
|
983
|
443
|
1390
|
103
|
115
|
178
|
887
|
1121
|
3794
|
GO:0098609
|
Cell-cell adhesion
|
1089
|
492
|
787
|
77
|
51
|
135
|
491
|
707
|
2231
|
GO:0098742
|
Cell-cell adhesion via plasma-membrane adhesion molecules
|
677
|
199
|
375
|
46
|
44
|
52
|
299
|
326
|
1142
|
GO:0099537
|
Trans-synaptic signaling
|
616
|
344
|
919
|
112
|
146
|
142
|
599
|
781
|
2699
|
GO:0007268
|
Chemical synaptic transmission
|
604
|
332
|
888
|
110
|
145
|
142
|
564
|
725
|
2574
|
GO:0050804
|
Modulation of synaptic transmission
|
529
|
298
|
1253
|
118
|
161
|
154
|
625
|
768
|
3079
|
GO:0044708
|
Single organism behavior
|
674
|
372
|
1336
|
91
|
111
|
196
|
718
|
843
|
3295
|
GO:0042391
|
Regulation of membrane potential
|
592
|
340
|
1315
|
145
|
203
|
387
|
732
|
822
|
3604
|
GO:1901137
|
Carbohydrate derivative biosynthetic process
|
678
|
448
|
526
|
41
|
148
|
70
|
218
|
249
|
1252
|
GO:0043005
|
Neuron projection
|
1486
|
805
|
1526
|
188
|
200
|
163
|
868
|
981
|
3926
|
GO:0098590
|
Plasma membrane region
|
1310
|
788
|
1016
|
193
|
247
|
75
|
642
|
833
|
3006
|
GO:0044456
|
Synapse part
|
1089
|
604
|
951
|
167
|
196
|
136
|
611
|
732
|
2793
|
GO:0009986
|
Cell surface
|
921
|
560
|
1965
|
142
|
172
|
137
|
1048
|
1164
|
4628
|
GO:0098794
|
Postsynapse
|
826
|
406
|
842
|
159
|
170
|
148
|
588
|
727
|
2634
|
GO:0044297
|
Cell body
|
737
|
409
|
1644
|
204
|
198
|
163
|
881
|
977
|
4067
|
GO:0097060
|
Synaptic membrane
|
679
|
305
|
664
|
133
|
169
|
138
|
572
|
680
|
2356
|
GO:0045211
|
Postsynaptic membrane
|
577
|
240
|
515
|
116
|
148
|
136
|
543
|
646
|
2104
|
GO:0036477
|
Somatodendritic compartment
|
995
|
582
|
1640
|
180
|
208
|
246
|
898
|
999
|
4171
|
GO:0043025
|
Neuronal cell body
|
645
|
359
|
1480
|
174
|
176
|
191
|
855
|
874
|
3750
|
GO:0030424
|
Axon
|
651
|
335
|
1029
|
152
|
130
|
143
|
465
|
562
|
2481
|
GO:0031012
|
Extracellular matrix
|
674
|
395
|
259
|
57
|
97
|
309
|
402
|
405
|
1529
|
GO:0005578
|
Proteinaceous extracellular matrix
|
554
|
311
|
356
|
40
|
87
|
232
|
229
|
245
|
1189
|
GO:0005509
|
Calcium ion binding
|
979
|
462
|
397
|
26
|
98
|
43
|
192
|
212
|
968
|
GO:0008233
|
Peptidase activity
|
574
|
400
|
303
|
30
|
83
|
71
|
175
|
204
|
849
|
GO:0070011
|
Peptidase activity, acting on L-amino acid peptides
|
547
|
384
|
306
|
29
|
83
|
47
|
162
|
197
|
824
|
NA
|
Cumulative, within
26 genomic regulatory networks
|
5237
|
3054
|
1105
|
121
|
190
|
71
|
424
|
486
|
2397
|
NA
|
Cumulative, excluding 26 genomic regulatory networks
|
3602
|
2685
|
27
|
12
|
23
|
17
|
26
|
24
|
129
|
Legend: *For each GRN, reported numbers of LTRs, genes, and GRMs were determined at the statistical significance threshold of the Binominal FDR Q-Value < 0.001 applied to 6 independent gene ontology-specific enrichment analyses employing the GREAT algorithm (Supplementary Materials); **, A total of 26 gene ontology enrichment analyses-defined GRNs associated with human embryo regulatory LTRs were identified at the statistical significance threshold of the Binominal FDR Q-value < 0.05 (Supplementary Table S1). GRNs contain from 529 to 1486 regulatory LTR elements and from 199 to 805 down-stream target genes. |
Interestingly, neuron cell-cell adhesion (GO:0007158) has been scored as one of significantly enriched cell type-specific GO categories (106 LTRs linked to 10 genes; binominal FDR Q-value = 9.09E-04). Multiple distinct GO categories related to biogenesis, structure, and functions of synapses appear preferential regulatory targets of human embryo regulatory LTRs (Table 2; Supplementary Table S1). These include regulation of synapse assembly (GO:0051963); regulation of synapse structure or activity (GO:0050803); synapse organization (GO:0050807); synapse assembly (GO:0007416); and synapse membrane adhesion (GO:0099560). Several findings suggest that putative regulatory actions of human embryo regulatory LTRs may affect fine-tuned biological structures and activities of synapses, including the presynaptic membrane assembly (GO:0097105; binominal FDR Q value = 7.42E-04) and organization (GO:0097090; 77 LTRs linked to 5 genes; binominal FDR Q value = 1.16E-03); postsynaptic membrane assembly (GO:0097104; 86 LTRs linked to 6 genes; binominal FDR Q value = 3.19E-04) as well as positive regulation of synapse assembly (GO:0051965; 355 LTRs linked to 40 genes; binominal FDR Q value = 4.83E-09). Notably, regulatory LTR elements and genes associated with the GO category of positive regulation of synapse assembly (GO:0051965) constitute a marked majority of regulatory LTRs and genes assigned to more general GO category of regulation of synapse assembly (GO:0051963): 91% (355/392) of LTRs and 78% (40/51) of genes, respectively (Table 2; Supplementary Table S1). Overall, the GO category of regulation of nervous system development (GO:0051960; binominal FDR Q value = 4,38E-02) has been linked with 1424 regulatory LTRs and 716 target genes, 333 of which have been implicated in development of nervous system (Supplementary Table S1).
Collectively, these findings suggest that biological processes of nervous system development and functions, in particular, biogenesis and activity of synapses, may represent one of important evolutionary targets of regulatory actions of human embryo regulatory LTR elements. Consistent with this hypothesis, significantly enriched GO category of positive regulation of nervous system development (GO:0051962; binominal FDR Q value = 5.95E-04) has been linked with 983 regulatory LTRs and 204 target genes (Table 2; Supplementary Table S1), while significantly enriched GO category of glutamate receptor signaling pathway (GO:0007215; binominal FDR Q value = 3.85E-04) has been associated with 173 regulatory LTRs and 27 target genes (Table 2; Supplementary Table S1). In contrast, GO category of negative regulation of nervous system development (GO:0051962) has been linked with 483 regulatory LTRs and 119 target genes known to exert negative effects on development of nervous system (Supplementary Table S1).
Furthermore, the validity of the concept of human embryo regulatory LTRs’ regulatory impacts on nervous system development and functions is supported by the identification of numerous significantly enriched phenotypic traits of biological relevance assigned by GO Cellular Component and by GO Molecular Function annotations as well as Mouse Phenotype Single KO and Mouse Phenotype databases (Supplementary Table S1). Notable observations of potential biological relevance include identification of serotonin receptor signaling pathway (GO:0007210; 54 LTRs linked to 7 genes; binominal FDR Q-value = 0.00016) and Y-linked inheritance (HP:0001450; 80 LTRs linked to 14 genes; binominal FDR Q-value = 5.60E-07) that were scored by the GREAT algorithm as the significantly enriched records of GO Biological Process and Human Phenotype databases-defined phenotypic traits, respectively (Supplementary Table S1). Among multiple significantly enriched phenotypes’ classification categories, GO Cellular Component annotations identified neuron projection (GO:0043005; 1486 LTRs linked to 386 genes) and synaptic membrane (GO:0097060; 679 LTRs linked to 132 genes), while GO Molecular Function analysis scored glutamate receptor activity (GO:0008066; 152 LTRs linked to 19 genes); neurotransmitter receptor activity (GO:0030594; 220 LTRs linked to 43 genes); neurexin family protein binding (GO:0042043; 58 LTRs linked to 7 genes); and calcium ion binding (GO:0005509; 979 LTRs linked to 243 genes) (Supplementary Table S1). Interestingly, structural elements of the postsynaptic membrane defined by the GO Cellular Component category (GO:0045211; 577 LTRs linked to 102 genes; binominal FDR Q-value = 1.98E-04) appears preferentially targeted by regulatory actions of human embryo regulatory LTRs compared to the presynaptic membrane features (GO:0042734; 186 LTRs linked to 36 genes; binominal FDR Q-value = 0.042) (Supplementary Table S1). The extended summary illustrating quantitative and qualitative distinctions of the regulatory LTR elements’ associations with GO analyses-defined features of the postsynaptic membrane versus the presynaptic membrane is reported in the Supplementary Summary S1.
Identification and characterization of 26 gene ontology-defined genomic regulatory networks (GRNs) facilitating functions of a genomic regulatory dominion (GRD) of human embryo regulatory LTR elements.
Results of Gene Ontology terms’ enrichment analyses indicate that a set of 5444 genes constituting candidate down-stream targets of human embryo regulatory LTRs appears segregated into multiple genomic regulatory networks (GRNs) defined by a single ontology-specific enrichment analysis (Fig. 1; Tables 2–5; Supplementary Table S1). Identification of GRNs is based on analyses of regulatory LTRs and down-stream target genes distribution patterns associated with each of significantly-enriched terms (Binominal FDR Q value < 0.05) identified by the GREAT algorithm during analyses of GO Biological Process (227 significant terms), GO Cellular Component (47 significant terms), and GO Molecular Function (80 significant terms) databases (Fig. 1). It has been observed that for a set of single ontology-defined significantly-enriched terms there is a significant positive correlation (R2 > 0.99) between genome fractions occupied by genomic features assigned to a set of significantly-enriched terms (referred to as the genomic modicums) and proportions of LTR elements residing within the genomic modicums (referred to as the LTR set coverage). A compendium of regulatory LTRs and down-stream target genes assigned to the individual significantly-enriched GO terms, genomic modicums of which were occupying more than 5% of human genome and harbor at least 6% of human embryo regulatory LTRs, were designated as GRNs (Fig. 1; Tables 2–5; Supplementary Table S1).
Table 5
Quantitative profiles of a directed acyclic graph (DAG) * test-validated genomic regulatory modules (GRMs) ** within 26 gene ontology enrichment analysis-defined genomic regulatory networks (GRNs) *** of the genomic dominion of human embryo regulatory LTRs.
Gene ontology ID
|
Genomic regulatory networks
|
Number of LTRs
|
GRN linked genes
|
GO Biological Process
|
GO Cellular Component
|
GO Molecular Function
|
Human Phenotype
|
Mouse Phenotype Single KO
|
Mouse Phenotype
|
Number of GRNs & GRMs
|
NA
|
Genomic dominion of human embryo regulatory LTRs
|
8839
|
NA
|
1
|
0
|
0
|
1
|
0
|
0
|
2
|
GO:0051960
|
Regulation of nervous system development
|
1424
|
333
|
972
|
46
|
31
|
50
|
521
|
838
|
2458
|
GO:0051962
|
Positive regulation of nervous system development
|
983
|
204
|
800
|
39
|
19
|
27
|
401
|
598
|
1884
|
GO:0098609
|
Cell-cell adhesion
|
1089
|
238
|
470
|
64
|
14
|
52
|
230
|
369
|
1199
|
GO:0098742
|
Cell-cell adhesion via plasma-membrane adhesion molecules
|
677
|
95
|
69
|
34
|
2
|
0
|
1
|
1
|
107
|
GO:0099537
|
Trans-synaptic signaling
|
616
|
151
|
482
|
64
|
102
|
6
|
322
|
435
|
1411
|
GO:0007268
|
Chemical synaptic transmission
|
604
|
146
|
440
|
60
|
96
|
7
|
253
|
356
|
1212
|
GO:0050804
|
Modulation of synaptic transmission
|
529
|
130
|
746
|
71
|
42
|
18
|
279
|
357
|
1513
|
GO:0044708
|
Single organism behavior
|
674
|
165
|
638
|
59
|
19
|
10
|
344
|
467
|
1537
|
GO:0042391
|
Regulation of membrane potential
|
592
|
153
|
620
|
103
|
90
|
82
|
217
|
268
|
1380
|
GO:1901137
|
Carbohydrate derivative biosynthetic process
|
678
|
209
|
215
|
26
|
52
|
0
|
0
|
0
|
293
|
GO:0043005
|
Neuron projection
|
1486
|
386
|
606
|
107
|
58
|
28
|
335
|
419
|
1553
|
GO:0098590
|
Plasma membrane region
|
1310
|
371
|
433
|
153
|
110
|
0
|
266
|
345
|
1307
|
GO:0044456
|
Synapse part
|
1089
|
270
|
468
|
118
|
83
|
4
|
238
|
320
|
1231
|
GO:0009986
|
Cell surface
|
921
|
267
|
854
|
96
|
43
|
3
|
394
|
523
|
1913
|
GO:0098794
|
Postsynapse
|
826
|
179
|
361
|
104
|
68
|
0
|
229
|
279
|
1041
|
GO:0044297
|
Cell body
|
737
|
188
|
536
|
81
|
50
|
32
|
190
|
250
|
1139
|
GO:0097060
|
Synaptic membrane
|
679
|
132
|
209
|
83
|
64
|
0
|
162
|
220
|
738
|
GO:0045211
|
Postsynaptic membrane
|
577
|
102
|
154
|
72
|
62
|
0
|
146
|
167
|
601
|
GO:0036477
|
Somatodendritic compartment
|
995
|
286
|
599
|
96
|
64
|
62
|
245
|
361
|
1427
|
GO:0043025
|
Neuronal cell body
|
645
|
165
|
464
|
71
|
45
|
51
|
170
|
204
|
1005
|
GO:0030424
|
Axon
|
651
|
158
|
516
|
93
|
50
|
0
|
161
|
208
|
1028
|
GO:0031012
|
Extracellular matrix
|
674
|
187
|
211
|
31
|
43
|
178
|
146
|
163
|
772
|
GO:0005578
|
Proteinaceous extracellular matrix
|
554
|
145
|
136
|
29
|
41
|
119
|
86
|
96
|
507
|
GO:0005509
|
Calcium ion binding
|
979
|
236
|
101
|
7
|
36
|
0
|
9
|
5
|
158
|
GO:0008233
|
Peptidase activity
|
574
|
199
|
50
|
6
|
38
|
0
|
0
|
0
|
94
|
GO:0070011
|
Peptidase activity, acting on L-amino acid peptides
|
547
|
191
|
49
|
5
|
39
|
0
|
0
|
0
|
93
|
NA
|
Cumulative, within
26 genomic regulatory networks
|
5237
|
ND
|
173
|
21
|
37
|
8
|
68
|
68
|
375
|
NA
|
Cumulative, excluding 26 genomic regulatory networks
|
3602
|
0
|
0
|
0
|
1
|
1
|
0
|
0
|
2
|
Legend: *, A directed acyclic graph (DAG) test was carried out on the enriched terms from a single ontology-specific table generated by the GREAT algorithm; |
**, For each GRN, reported numbers of GRMs were determined using a directed acyclic graph (DAG) test applied to 6 independent gene ontology-specific enrichment analyses employing the GREAT algorithm (Supplementary Materials); ***, A total of 26 gene ontology enrichment analyses-defined GRNs associated with human embryo regulatory LTRs were identified at the statistical significance threshold of the Binominal FDR Q-value < 0.05 (Supplementary Table S1). GRNs contain from 529 to 1486 regulatory LTR elements and from 199 to 805 down-stream target genes. |
In total, there are 26 gene ontology-defined GRNs associated with human embryo regulatory LTRs each of which contain more than 500 regulatory LTR elements and more than 199 down-stream target genes (Fig. 1; Tables 3–5; Supplementary Table S1). Genomic locations of GRNs appear scattered across chromosomes to occupy from 5.5–15.09% of the human genome. GRNs integrate activities from 529 to 1486 human embryo regulatory LTR elements and from 199 to 805 down-stream target genes, including transcription factors, chromatin-state remodelers, sensing and signal transduction mediators, enzymatic and receptor binding effectors, intracellular complexes and extracellular matrix elements, and cell-cell adhesion molecules. One of notable common structural features of GRNs is that numbers of regulatory LTR elements derived from LTR7, MLT2A1, and MLT2A2 sequences residing within the boundaries of GRN’s genomic modicums are quantitatively balanced in strict accordance to their genome-wide abundance levels (Table 4).
Table 4
Quantitative profiles of regulatory LTR elements and down-stream target genes within 26 gene ontology enrichment analysis-defined genomic regulatory networks (GRNs) of the genomic dominion of human embryo regulatory LTRs.
Gene ontology ID
|
Genomic regulatory networks
|
Number, LTR7
|
Number, MLT2A1
|
Number, MLT2A2
|
Number of linked genes
|
Percent, LTR7
|
Percent, MLT2A1
|
Percent, MLT2A2
|
NA
|
Genomic dominion of human embryo regulatory LTRs
|
3354
|
2416
|
3069
|
5444
|
100.0
|
100.0
|
100.0
|
GO:0051960
|
Regulation of nervous system development
|
500
|
409
|
515
|
716
|
14.9
|
16.9
|
16.8
|
GO:0051962
|
Positive regulation of nervous system development
|
333
|
290
|
360
|
443
|
9.9
|
12.0
|
11.7
|
GO:0098609
|
Cell-cell adhesion
|
385
|
301
|
403
|
492
|
11.5
|
12.5
|
13.1
|
GO:0098742
|
Cell-cell adhesion via plasma-membrane adhesion molecules
|
235
|
186
|
256
|
199
|
7.0
|
7.7
|
8.3
|
GO:0099537
|
Trans-synaptic signaling
|
215
|
161
|
240
|
344
|
6.4
|
6.7
|
7.8
|
GO:0007268
|
Chemical synaptic transmission
|
211
|
158
|
235
|
332
|
6.3
|
6.5
|
7.7
|
GO:0050804
|
Modulation of synaptic transmission
|
174
|
133
|
222
|
298
|
5.2
|
5.5
|
7.2
|
GO:0044708
|
Single organism behavior
|
230
|
185
|
259
|
372
|
6.9
|
7.7
|
8.4
|
GO:0042391
|
Regulation of membrane potential
|
217
|
151
|
224
|
340
|
6.5
|
6.3
|
7.3
|
GO:1901137
|
Carbohydrate derivative biosynthetic process
|
261
|
175
|
242
|
448
|
7.8
|
7.2
|
7.9
|
GO:0043005
|
Neuron projection
|
542
|
411
|
533
|
805
|
16.2
|
17.0
|
17.4
|
GO:0098590
|
Plasma membrane region
|
519
|
337
|
454
|
788
|
15.5
|
13.9
|
14.8
|
GO:0044456
|
Synapse part
|
373
|
293
|
423
|
604
|
11.1
|
12.1
|
13.8
|
GO:0009986
|
Cell surface
|
370
|
239
|
312
|
560
|
11.0
|
9.9
|
10.2
|
GO:0098794
|
Postsynapse
|
276
|
229
|
321
|
406
|
8.2
|
9.5
|
10.5
|
GO:0044297
|
Cell body
|
275
|
204
|
258
|
409
|
8.2
|
8.4
|
8.4
|
GO:0097060
|
Synaptic membrane
|
232
|
182
|
265
|
305
|
6.9
|
7.5
|
8.6
|
GO:0045211
|
Postsynaptic membrane
|
195
|
156
|
226
|
240
|
5.8
|
6.5
|
7.4
|
GO:0036477
|
Somatodendritic compartment
|
346
|
281
|
368
|
582
|
10.3
|
11.6
|
12.0
|
GO:0043025
|
Neuronal cell body
|
225
|
188
|
232
|
359
|
6.7
|
7.8
|
7.6
|
GO:0030424
|
Axon
|
246
|
174
|
231
|
335
|
7.3
|
7.2
|
7.5
|
GO:0031012
|
Extracellular matrix
|
244
|
199
|
231
|
395
|
7.3
|
8.2
|
7.5
|
GO:0005578
|
Proteinaceous extracellular matrix
|
191
|
176
|
187
|
311
|
5.7
|
7.3
|
6.1
|
GO:0005509
|
Calcium ion binding
|
354
|
228
|
397
|
462
|
10.6
|
9.4
|
12.9
|
GO:0008233
|
Peptidase activity
|
240
|
158
|
176
|
400
|
7.2
|
6.5
|
5.7
|
GO:0070011
|
Peptidase activity, acting on L-amino acid peptides
|
226
|
152
|
169
|
384
|
6.7
|
6.3
|
5.5
|
NA
|
Cumulative, within
26 genomic regulatory networks
|
1952
|
1393
|
1892
|
3054
|
58.2
|
57.7
|
61.6
|
NA
|
Cumulative, excluding
26 genomic regulatory networks
|
1402
|
1023
|
1177
|
2685
|
41.8
|
42.3
|
38.4
|
Legend: *For each GRN, reported numbers of LTRs and genes were determined at the statistical significance threshold of the Binominal FDR Q-Value < 0.001 applied to 6 independent gene ontology-specific enrichment analyses employing the GREAT algorithm (Supplementary Materials); **, A total of 26 gene ontology enrichment analyses-defined GRNs associated with human embryo regulatory LTRs were identified at the statistical significance threshold of the Binominal FDR Q-value < 0.05 (Supplementary Table S1). GRNs contain from 529 to 1486 regulatory LTR elements and from 199 to 805 down-stream target genes. |
Follow-up gene ontology enrichment analyses of 26 GRNs revealed further details of their genotype-phenotype association-defined architectural complexity: GRNs consist from several hundred to thousands quantitatively smaller gene ontology enrichment analysis-defined genomic regulatory modules (GRMs) (Table 3; Supplementary Table S1; Supplementary Data Set S1: GRN1-GRN26). Importantly, to increase the stringency of statistical definitions of candidate regulatory LTR-associated GRMs, the cut-off threshold of the Binominal FDR Q-value < 0.001 was applied in these series of GO terms enrichment analyses. One of the implications of this approach is that the false discovery rates (FDR) in these analyses could be estimated at less than 0.1%, thus indicating that the reported observations have more than 99.9% likelihood of being true findings. Based on the results of these analyses, it has been concluded that genomic constituents of each GRN comprise of several hundred to thousands smaller gene ontology enrichment analysis-defined GRMs, each of which contains from a dozen to hundreds LTRs and down-stream target genes. Overall, these analyses identified 69,573 statistically significant regulatory LTR-linked GRMs (Tables 3–4; Supplementary Table S1; Supplementary Data Set S1: GRN1-GRN26), including 27,601 GRMs validated by the single ontology-specific directed acyclic graph (DAG) analyses across 6 gene ontology annotations databases (Table 5; Supplementary Table S1; Supplementary Data Set S1: GRN1-GRN26). It has been determined that within each GRN there is a significant positive correlation between the human genome fractions occupied by the GRMs and LTR set coverage for the GRMs (Supplementary Summary S2).
These observations suggest that human embryo regulatory LTRs may exert regulatory effects on thousands distinct GRMs containing functionally and/or structurally-related sub-sets of genes affecting multiple facets of Modern Humans’ developmental processes as well as physiological and pathological traits, in particular, a constellation of biological processes, cellular components, and molecular activities affecting development and functions of the nervous system.
Selected examples of GRMs and associated phenotypic traits identified by GO analyses of specific regulatory LTR-governed GRNs.
Potential phenotypic impacts on human pathophysiology of human embryo regulatory LTR-governed GRNs could be inferred by analyses of significantly enriched genotype-phenotype associations annotated to their composite GRMs. The GRN “Regulation of Membrane Potential” have the largest number of significantly-enriched GRMs and associated phenotypic traits identified by GO analyses of the Human Phenotype database (Tables 3–5; Supplementary Data Set S1). A set of 79 regulatory LTRs linked to 10 genes have been identified that were annotated to GRM X-linked inheritance (HP:0001417) and GRM Gonosomal inheritance (HP:0010985) at Binominal FDR Q-values of 2.82E-29 and 1.86E-25, respectively. Multiple significantly-enriched GRMs are associated with human phenotypes of major clinical importance. These include: Autism (HP:0000717; 57 LTRs linked to 7 genes; Binominal FDR Q-value = 3.39E-33); and Autistic behavior (HP:0000729; 65 LTRs linked to 11 genes; Binominal FDR Q-value = 2.10E-26); Parkinsonism (HP:0001300; 26 LTRs linked to 8 genes; Binominal FDR Q value = 1.71E-11); Dementia (HP:0000726; 26 LTRs linked to 7 genes; Binominal FDR Q-value = 1.20E-09); Intellectual disability (HP:0001249; 179 LTRs linked to 43 genes; Binominal FDR Q value = 4.28E-31) and a severe form of Intellectual disability (HP:0010864; 52 LTRs linked to 8 genes; Binominal FDR Q value = 2.13E-24); Syncope (HP:0001279; 35 LTRs linked to 11 genes; Binominal FDR Q value = 5.46E-24); Abnormality of cardiovascular system electrophysiology (HP:0030956; 64 LTRs linked to 22 genes; Binominal FDR Q value = 1.96E-26); and Seizures (HP:0001250; 160 LTRs linked to 36 genes; Binominal FDR Q value = 5.19E-33). The extended summary illustrating selected findings derived from GO analyses of the Human Phenotype database related to human disorders is reported in the Supplementary Summary S3.
GO Biological Process analyses identified multiple GRMs representing common significantly-enriched phenotypic traits associated with numerous GRNs. Several GRMs common for the GRN “Synapse Part” and the GRN “Postsynapse” include regulation of amyloid precursor protein catabolic process; regulation of beta-amyloid formation; positive regulation of beta-amyloid formation; response to beta-amyloid; and cellular response to beta-amyloid.
GRNs “Synaptic Membrane” and “Postsynaptic Membrane” have several common GRMs, namely regulation of amyloid precursor protein catabolic process; regulation of beta-amyloid formation; and positive regulation of beta-amyloid formation. Similarly, common GRMs for the GRN “Trans-synaptic Signaling” and the GRN “Chemical Synaptic Transmission” are beta-amyloid metabolic process; response to beta-amyloid; and cellular response to beta-amyloid; while the GRM “Regulation of Membrane Potential” harbors a set of functionally-related GRMs, namely regulation of beta-amyloid formation; positive regulation of beta-amyloid formation; response to beta-amyloid; and cellular response to beta-amyloid.
GO Molecular Function analyses identified several GRMs representing common significantly-enriched phenotypic traits associated with seven GRNs, namely “Synapse Part”; “Postsynapse”; “Synaptic Membrane”; “Postsynaptic Membrane”; “Trans-synaptic Signaling”; “Chemical Synaptic Transmission”; “Regulation of Membrane Potential”. A set of common GRMs shared by these GRNs include beta-amyloid binding; ionotropic glutamate receptor activity; kainate selective glutamate receptor; NMDA glutamate receptor; AMPA glutamate receptor. Interestingly, the GO Molecular Function terms ionotropic glutamate receptor activity; kainate selective glutamate receptor; NMDA glutamate receptor; AMPA glutamate receptor represent the exactly same selection of phenotypic traits that were identified as down-stream regulatory targets of human-specific LTR7 and LTR5_Hs loci (Glinsky, 2022).
Overall, genes linked to phenotypic traits reflecting various aspects of physiological and pathological processes associated with the biogenesis, metabolism, and functions of the amyloid precursor protein and beta-amyloid peptides appear to represent one of the down-stream targets of preference during the evolution of human embryo regulatory LTR elements.
Validation of GRNs and GRMs governed by human embryo regulatory LTR elements employing the directed acyclic graph (DAG) analysis.
It was important to confirm the validity of statistical definitions of GRNs and GRMs based on the binominal and hypergeometrc FDR Q values (Methods) employing independent analytical approaches. To this end, a directed acyclic graph (DAG) test based on the enriched terms from a single ontology-specific table generated by a GREAT algorithm has been employed (Figs. 2–3; Table 3; Supplementary Data Sets S1: GRN1-GRN26). Patterns and directions of connections between significantly enriched GO modules are guided by the experimentally-documented temporal logic of developmental processes and structural/functional relationships between gene ontology enrichment analysis-defined statistically significant terms. Therefore, only statistically significant GRMs from a single gene ontology-specific table generated by the GREAT algorithm that manifest connectivity patterns defined by experimentally documented developmental and/or structure/function/activity relationships are deemed valid observations and visualized as a consensus hierarchy network of the ontology-specific DAGs (Figs. 2–3; Tables 3–5; Supplementary Data Sets S1: GRN1-GRN26). Based on these considerations, the DAG algorithm draw the developmental and structure/function/activity relationships-guided hierarchy of connectivity between statistically significant gene ontology enriched GRMs. Visualization snapshots of representative examples illustrating the individual ontology-specific DAGs are reported in the Figs. 2–3 and comprehensive reports of quantitative and qualitative components of these analyses are presented in Tables 3–5; Supplementary Data Sets S1: GRN1-GRN26.
GO analyses of Mouse phenotype databases and GSEA of the MGI Mammalian Phenotype Level 4 2021 database revealed that genes comprising down-stream regulatory targets of human embryo regulatory LTRs affect development and functions of essentially all major tissues, organs, and organ systems. These include central and peripheral nervous systems, cardio-vascular (circulatory) and lymphatic systems, gastrointestinal (digestive) and urinary (excretory) systems, musculoskeletal and respiratory systems, endocrine and immune systems, as well as integumentary and reproductive systems. Analyses of single gene KO models revealed profound developmental defects and numerous examples of organogenesis failures, indicating that activities of regulatory LTR down-stream target genes are essential for development and functions of major organs and organ systems.
Of potentially major interest are observations that all 26 GRNs governed by human embryo regulatory LTR elements harbor significantly enriched records of GRMs annotated to multiple distinct GO terms of Mammalian Offspring’s Survival (MOS) phenotypes (Table 6; Supplementary Data Set S1: GRN1-GRN26). For example, GRN2 “Regulation of nervous system development” (GO:0051960) harbors numerous significantly enriched GRMs annotated to distinct MOS phenotypes, including: neonatal lethality, complete penetrance and neonatal lethality, incomplete penetrance; lethality at weaning, complete penetrance; perinatal lethality, complete penetrance; postnatal lethality, complete penetrance and postnatal lethality, incomplete penetrance; embryonic lethality during organogenesis and embryonic lethality during organogenesis, complete penetrance; lethality throughout fetal growth and development and lethality throughout fetal growth and development, incomplete penetrance; lethality at weaning; perinatal, neonatal, and postnatal lethality phenotypes (Table 6; Supplementary Data Set S1: GRN2). Other notable GRMs of regulatory LTR-linked GRNs that were annotated to significantly enriched MOS phenotypes include: embryonic lethality between implantation and placentation, complete penetrance; embryonic lethality between implantation and somite formation, incomplete penetrance; embryonic lethality between somite formation and embryo turning, incomplete penetrance; prenatal lethality prior to heart atrial septation; embryonic lethality, complete penetrance (Table 6; Supplementary Data Set S1: GRN1-GRN26).
Table 6
Genes regulating Mammalian Offspring’s Survival (MOS) phenotypes are intrinsic components of twenty six Genomic Regulatory Networks (GRNs) governed by human embryo regulatory LTR elements.
GRN ID
|
GO Category
|
GO ID
|
GRN GO Name
|
Number of MOS phenotypes*
|
Top scoring MOS phenotype
|
GO ID
|
Binominal Rank
|
Binominal FDR Q-value
|
GRN1
|
GO Cellular Component
|
GO:0043005
|
neuron projection
|
5
|
postnatal lethality, complete penetrance
|
MP:0011085
|
167
|
5.56E-16
|
GRN2
|
GO Biological Process
|
GO:0051960
|
regulation of nervous system development
|
12
|
neonatal lethality, complete penetrance
|
MP:0011087
|
33
|
1.35E-33
|
GRN3
|
GO Cellular Component
|
GO:0044456
|
synapse part
|
6
|
postnatal lethality, incomplete penetrance
|
MP:0011086
|
245
|
1.18E-10
|
GRN4
|
GO Biological Process
|
GO:0098609
|
cell-cell adhesion
|
4
|
lethality at weaning, complete penetrance
|
MP:0011083
|
41
|
4.63E-13
|
GRN5
|
GO Cellular Component
|
GO:0036477
|
somatodendritic compartment
|
5
|
lethality at weaning
|
MP:0008569
|
180
|
3.16E-13
|
GRN6
|
GO Biological Process
|
GO:0051962
|
positive regulation of nervous system development
|
9
|
neonatal lethality, complete penetrance
|
MP:0011087
|
110
|
7.28E-20
|
GRN7
|
GO Molecular Function
|
GO:0005509
|
calcium ion binding
|
3
|
embryonic lethality between implantation and placentation, complete penetrance
|
MP:0011095
|
329
|
2.01E-02
|
GRN8
|
GO Cellular Component
|
GO:0098794
|
postsynapse
|
4
|
lethality at weaning, complete penetrance
|
MP:0011083
|
150
|
9.09E-13
|
GRN9
|
GO Cellular Component
|
GO:0097060
|
synaptic membrane
|
4
|
lethality at weaning, complete penetrance
|
MP:0011083
|
183
|
2.98E-10
|
GRN10
|
GO Biological Process
|
GO:1901137
|
carbohydrate derivative biosynthetic process
|
6
|
embryonic lethality between implantation and somite formation, incomplete penetrance
|
MP:0011106
|
40
|
1.44E-11
|
GRN11
|
GO Biological Process
|
GO:0098742
|
cell-cell adhesion via plasma-membrane adhesion molecules
|
5
|
neonatal lethality, complete penetrance
|
MP:0011087
|
106
|
1.45E-08
|
GRN12
|
GO Biological Process
|
GO:0044708
|
single-organism behavior
|
6
|
postnatal lethality, incomplete penetrance
|
MP:0011086
|
65
|
9.98E-25
|
GRN13
|
GO Cellular Component
|
GO:0031012
|
extracellular matrix
|
5
|
lethality throughout fetal growth and development, incomplete penetrance
|
MP:0011109
|
14
|
2.45E-14
|
GRN14
|
GO Cellular Component
|
GO:0030424
|
axon
|
5
|
postnatal lethality, complete penetrance
|
MP:0011085
|
178
|
4.09E-10
|
GRN15
|
GO Cellular Component
|
GO:0043025
|
neuronal cell body
|
3
|
embryonic lethality between implantation and placentation, complete penetrance
|
MP:0011095
|
341
|
3.66E-07
|
GRN16
|
GO Biological Process
|
GO:0099537
|
trans-synaptic signaling
|
8
|
lethality at weaning
|
MP:0008569
|
104
|
9.72E-24
|
GRN17
|
GO Biological Process
|
GO:0007268
|
chemical synaptic transmission
|
8
|
lethality at weaning
|
MP:0008569
|
94
|
3.79E-24
|
GRN18
|
GO Biological Process
|
GO:0042391
|
regulation of membrane potential
|
7
|
postnatal lethality, incomplete penetrance
|
MP:0011086
|
129
|
3.50E-16
|
GRN19
|
GO Cellular Component
|
GO:0045211
|
postsynaptic membrane
|
3
|
lethality at weaning, complete penetrance
|
MP:0011083
|
143
|
6.43E-12
|
GRN20
|
GO Molecular Function
|
GO:0008233
|
peptidase activity
|
2
|
embryonic lethality during organogenesis, complete penetrance
|
MP:0011098
|
304
|
2.26E-02
|
GRN21
|
GO Cellular Component
|
GO:0005578
|
proteinaceous extracellular matrix
|
5
|
lethality throughout fetal growth and development, incomplete penetrance
|
MP:0011109
|
6
|
1.90E-17
|
GRN22
|
GO Molecular Function
|
GO:0070011
|
peptidase activity, acting on L-amino acid peptides
|
2
|
postnatal lethality, complete penetrance
|
MP:0011085
|
236
|
1.06E-02
|
GRN23
|
GO Biological Process
|
GO:0050804
|
modulation of synaptic transmission
|
6
|
preweaning lethality, incomplete penetrance
|
MP:0011110
|
194
|
6.09E-11
|
GRN24
|
GO Cellular Component
|
GO:0044297
|
cell body
|
2
|
embryonic lethality between implantation and placentation, complete penetrance
|
MP:0011095
|
72
|
5.69E-16
|
GRN25
|
GO Cellular Component
|
GO:0009986
|
cell surface
|
7
|
neonatal lethality, complete penetrance
|
MP:0011087
|
22
|
1.97E-21
|
GRN26
|
GO Cellular Component
|
GO:0098590
|
plasma membrane region
|
3
|
postnatal lethality, incomplete penetrance
|
MP:0011086
|
352
|
2.31E-07
|
Legend: *, number of MOS phenotypes among top 500 enriched GO terms. |
Collectively, these observations support the conclusion that GRNs comprising the GRD of human embryo regulatory LTR elements could be viewed as an intrinsically integrated compendium of thousands genotype-phenotypic trait associations-defined GRMs, many of which may represent the evolutionary selection units defined by inherent genotype-phenotype associations affecting species’ fitness and survival.
Gene Set Enrichment Analyses (GSEA) of down-stream target genes of human embryo regulatory LTR elements revealed global multifaceted impacts on physiological traits, developmental phenotypes, and pathological conditions of Modern Humans.
To gain further insights into architectural frameworks and functional features of the GRD associated with human embryo regulatory LTR elements, the Enrichr bioinformatics platform was utilized to carry out the comprehensive series of GSEA employing more than 70 genomics and proteomics databases (Methods). GSEA were executed by imputing 5444 candidate down-stream regulatory target genes of human embryo regulatory LTR elements and retrieving significantly enriched records capturing phenotypic traits affected by subsets of candidate LTR-regulated genes (Tables 7–8; Supplementary Figure S1; Supplementary Tables S1-S2). These analyses documented global multifaceted impacts of genes representing putative down-stream regulatory targets of human embryo regulatory LTR elements on physiological traits, developmental phenotypes, and pathological conditions of Modern Humans (Table 7; Supplementary Figure S1; Supplementary Table S1).
Table 7
Gene Set Enrichment Analyses (GSEA) of 5444 genes comprising putative down-stream targets of human embryo regulatory LTRs revealed their global multifaceted impacts on physiological traits, developmental phenotypes, and pathological conditions of Modern Humans.
Classification category
|
Number of significantly enriched records*
|
Common & rare human diseases
|
|
DisGeNET
|
477
|
Orphanet Augmented 2021
|
1095
|
Rare Diseases AutoRIF ARCHS4 Predictions
|
140
|
Rare Diseases GeneRIF ARCHS4 Predictions
|
97
|
Rare Diseases GeneRIF Gene Lists
|
96
|
Rare Diseases AutoRIF Gene Lists
|
610
|
Disease Perturbations from GEO down
|
128
|
Disease Perturbations from GEO up
|
116
|
Diabetes Perturbations GEO 2022
|
217
|
MAGMA Drugs and Diseases
|
82
|
HDSigDB Human 2021 (Huntington disease)
|
1004
|
HDSigDB Mouse 2021 (Huntington disease)
|
677
|
MSigDB Oncogenic Signatures
|
49
|
Jensen database of human diseases
|
79
|
RNA-Seq Disease Gene and Drug Signatures from GEO
|
323
|
Developmental & Regulatory Pathways
|
|
WikiPathway 2021 Human
|
33
|
KEGG 2021 Human
|
17
|
BioPlanet 2019
|
48
|
Panther 2016
|
1
|
Reactome 2022
|
26
|
Transcription Factor PPIs
|
1
|
ARCHS4 TF Co-expression
|
284
|
ARCHS4 Kinases Co-expression
|
106
|
TF Perturbations Followed by Expression
|
532
|
Gene Perturbations from GEO up
|
342
|
Gene Perturbations from GEO down
|
252
|
Endogenous Ligand Perturbations from GEO up
|
43
|
Endogenous Ligand Perturbations from GEO down
|
46
|
lncHUB lncRNA Co-Expression
|
220
|
Kinase Perturbations from GEO down
|
17
|
Kinase Perturbations from GEO up
|
27
|
Gene Ontology
|
|
GO Cellular Component 2021
|
17
|
GO Molecular Function 2021
|
12
|
GO Biological Process 2021
|
177
|
MGI Mammalian Phenotype Level 4 2021
|
332
|
Human Phenotype Ontology
|
27
|
Single cell genomics & epigenetics
|
|
PanglaoDB Augmented 2021
|
87
|
CellMarker Augmented 2021
|
255
|
Azimuth Cell Types 2021
|
101
|
HuBMAP ASCTplusB Augmented 2022
|
191
|
Human tissues
|
|
GTEx Tissue Expression Up
|
513
|
GTEx Tissue Expression Down
|
163
|
Human Gene Atlas
|
15
|
ARCHS4 Tissues
|
48
|
Allen Brain Atlas up
|
787
|
Allen Brain Atlas down
|
245
|
RNAseq and Proteomics
|
|
RNAseq Automatic GEO Signatures Human Up
|
1494
|
RNAseq Automatic GEO Signatures Human Down
|
1571
|
ProteomicsDB 2020
|
6
|
CCLE Proteomics 2020
|
49
|
Aging phenotype signatures
|
|
GTEx Aging Signatures 2021
|
13
|
Aging Perturbations from GEO up
|
10
|
Aging Perturbations from GEO down
|
30
|
Genotype-phenotype associations
|
|
GWAS Catalog 2019
|
271
|
OMIM Expanded
|
175
|
ClinVar 2019
|
1
|
PhenGenI Association 2021
|
150
|
PheWeb 2019
|
862
|
dbGaP
|
144
|
Host-pathogen interactions
|
|
COVID-19 Related Gene Sets 2021
|
138
|
Virus Perturbations from GEO up
|
55
|
Virus Perturbations from GEO down
|
55
|
Microbe Perturbations from GEO up
|
18
|
Microbe Perturbations from GEO down
|
13
|
Legend: * Numbers of significantly enriched records identified by GSEA during analyses of corresponding genomics and proteomics databases to define entries at the statistical significance threshold of adjusted P value (FDR q value) < 0.05. |
Table 8
Quantitative profiles of a directed acyclic graph (DAG)* test-validated genomic regulatory modules (GRMs)** within GSEA-defined genomic regulatory panels (GRPs)*** of the genomic dominion of human embryo regulatory LTRs.
Genomic regulatory panels (GRPs)
|
Number of LTRs
|
Number of linked genes
|
GO Biological Process
|
GO
Cellular Component
|
GO Molecular Function
|
Human Phenotype
|
Mouse Phenotype Single KO
|
Mouse Phenotype
|
Number of GRMs
|
DAG Visualization Report
|
Neurotrophins (neurotrophic factors)
|
8
|
12
|
12
|
0
|
2
|
30
|
70
|
77
|
191
|
Figure S6
|
Neurotrophins (expanded)
|
29
|
147
|
4
|
3
|
8
|
0
|
70
|
59
|
144
|
Supplementary Data Sets S2
|
Neurexins and Neuroligins R-HSA-6794361
|
120
|
30
|
61
|
19
|
4
|
0
|
0
|
0
|
84
|
Figure S5
|
Neurotransmitter receptor activity
|
137
|
30
|
34
|
29
|
38
|
0
|
20
|
32
|
153
|
Figure S7
|
Signaling by Receptor Tyrosine Kinases R-HSA-9006934
|
241
|
81
|
172
|
8
|
12
|
0
|
24
|
69
|
285
|
Figure S9
|
RET-GDNF pathway
|
115
|
28
|
181
|
0
|
1
|
8
|
107
|
191
|
488
|
Figure S4
|
Neuroactive ligand-receptor interactions
|
305
|
84
|
85
|
30
|
43
|
0
|
80
|
85
|
323
|
Figure S8
|
GSEA-defined GRN of regulatory LTRs
|
966
|
255
|
287
|
38
|
20
|
2
|
136
|
264
|
747
|
Figure S10
|
GSEA-defined GRN of regulatory LTRs (all significant GRMs)#
|
966
|
255
|
1359
|
96
|
132
|
178
|
955
|
1165
|
3885
|
Supplementary Data Sets S2
|
Legend: *, A directed acyclic graph (DAG) test was carried out on the enriched terms from a single ontology-specific table generated by the GREAT algorithm; |
**, For each GRP, reported numbers of GRMs were determined using a directed acyclic graph (DAG) test applied to 6 independent gene ontology-specific enrichment analyses employing the GREAT algorithm (see Supplementary Data Sets S2 for complete reports); |
***, GSEA-defined GRPs associated with human embryo regulatory LTRs were identified using the Enrichr Bioinformatics Platform at the statistical significance threshold of the adjusted P-value < 0.05 (Supplementary Data Sets S2); |
#, reported numbers of GRMs were identified by the GREAT algorithm at the statistical significance threshold of the Binominal FDR Q-value < 0.001 (Supplementary Data Sets S2). |
Broadly, identified in this study regulatory LTR target genes have been implicated in pathogenesis and manifestation of a large spectrum of human common, orphan, and rare disorders; they are engaged in numerous developmental and regulatory pathways contributing to differentiation and specialization processes across the hierarchy of human organs, tissues, cells, and sub-cellular structures; have been shown to affect phenotypic traits at the multiple levels of a cellular molecular composition and regulatory infrastructure, including signal transduction triggered by numerous endogenous ligands and multiple families of transmembrane receptors; structural/functional organization of mitochondria, endoplasmic reticulum, and nucleus; activities of transcription factors and chromatin remodeling effectors. Analyzed herein regulatory LTR target genes have been linked with the human aging phenotype in gene expression profiling experiments and with a multitude of both physiological and pathological human phenotypes in genotype-phenotype association studies across various analytical platforms and databases, in particular, human disease associations’ databases (Table 7; Supplementary Figure S1; Supplementary Table S1).
Protein products of regulatory LTR down-stream target genes have been documented to participate in protein-protein interactions of endogenous multiprotein complexes as well as host-pathogen interactions during viral and microbial infections. GSEA of regulatory LTR down-stream target genes employing the OMIM Expanded database of protein-protein interactions (PPI) of human disease-causing genes identified 34 regulatory LTR target genes, protein products of which appear engaged in PPIs with protein products of disease-causing genes implicated in development of a wide spectrum of human disorders (Supplementary Figure S1; Supplementary Table S1). Genes comprising the 34-gene signature of regulatory LTR targets are significantly enriched in 1387 records of human diseases (GSEA of the DisGeNET database; adjusted p-value < 0.05). These observations prompted in depth follow-up investigations of a regulatory network comprising 88 regulatory LTR elements (27LTR7; 35 MLT2A1; and 26 MLT2A2 sequences), which were mapped as putative up-stream regulator elements of the 34 genes.
Characterization of 88 human embryo regulatory LTRs comprising putative up-stream regulatory elements of PPI networks of human disease-causing genes.
Results of the DAG analyses of 88 human embryo regulatory LTR elements revealed highly complex hierarchical networks of significantly enriched phenotypic traits identified by analyses of GO Biological Process and Mouse Phenotype databases, while analyses of GO Cellular Component and GO Molecular Function documented relatively simple architectures of significantly enriched phenotypic traits (Fig. 4; Supplementary Table S1; Supplementary Data Set S2). Overall, 1533 significantly enriched records linked to down-stream target genes of 88 regulatory LTR elements have been identified at the statistical threshold of the Binominal FDR Q-value < 0.001. These include 432; 36; 42; 58; 243; and 722 significantly enriched phenotypic traits identified by Gene Ontology analyses of GO Biological Process; GO Cellular Component; GO Molecular Function; Human Phenotype; Mouse Phenotype Single KO; and Mouse Phenotype databases, respectively (Supplementary Data Set S2).
Of note, GO enrichment analyses of 88 regulatory LTR elements employing the Mouse Phenotype Single KO database identified numerous significantly records of genes annotated to distinct mammalian offspring’s survival (MOS) phenotypes. The list of enriched GO terms of this category includes several prenatal and perinatal lethality phenotypes [perinatal lethality (MP:0002081); perinatal lethality, complete penetrance (MP:0011089); perinatal lethality, incomplete penetrance (MP:0011090); prenatal lethality, complete penetrance (MP:0011091); prenatal lethality, incomplete penetrance (MP:0011101)]; distinct embryonic lethality phenotypes [embryonic lethality, complete penetrance (MP:0011092); embryonic lethality between implantation and somite formation (MP:0006205); embryonic lethality during organogenesis MP:0006207); embryonic lethality between implantation and placentation (MP:0009850); and the premature death phenotype (MP:0002083).
GSEA of 34 genes identified hundreds of significantly enriched phenotypic traits across numerous genomics and proteomics databases, representative examples of which are reported in the Fig. 5; Supplementary Figure S1; Supplementary Table S1. In multiple instances numbers of significantly enriched phenotypic records linked with 34 genes are markedly higher compared to the parent set of 5444 genes (Fig. 5B; Supplementary Figure S1; Supplementary Table S1). For example, there are 121 and 124 significantly enriched records of 34 genes identified by GSEA of the Transcription Factors PPI database and the Hub Proteins PPI database, respectively, while there is only one significantly enriched record in each database identified by GSEA of 5444 genes (Fig. 5B). Similarly, there are 1387 and 477 significantly enriched records of human diseases identified by GSEA of 34 genes and 5444 genes, respectively, employing the DisGeNET database (Supplementary Figure S1; Supplementary Table S1). Apparent qualitative distinctions were noted between top 10 significantly enriched records of human diseases: all top 10 GSEA records of 34 genes’ enrichment were various records of malignant neoplasms, while top 10 records linked with GSEA of 5444 genes were neurodevelopmental and substance abuse disorders (Supplementary Figure S1; Supplementary Table S1). In contrast to common human diseases, 78 and 1095 significantly enriched records of human orphan disorders have been identified by GSEA of 34 genes and 5444 genes, respectively, employing the Orphanet Augmented 2021 database (Supplementary Figure S1; Supplementary Table S1).
Follow-up analyses revealed that overall there are 206 human genes encoding proteins manifesting significantly enriched records of protein-protein interactions (PPIs) with protein products encoded by 34 genes (Figs. 5–6; Supplementary Table S1; Supplementary Summaries S4-S5). Because protein products of 34 genes manifest significant enrichment of PPIs with human disease-causing genes (Fig. 5; Supplementary Figure S1; Supplementary Table S1; Supplementary Summary S4), these observations suggest that a network of proteins encoded by 240 genes may be engaged in PPIs with human disease-causing genes as well. This hypothesis was tested by performing GSEA of 240 genes comprising the putative PPI network of human disease-causing genes (Figs. 5–6; Supplementary Table S1; Supplementary Summaries S4-S5).
GSEA of 240 genes consistently demonstrated that numbers of significantly enriched phenotypic traits linked with 240 genes are markedly higher compared to either 34 genes or the parent set of 5444 genes (Figs. 5–6; Supplementary Table S1; Supplementary Summaries S4-S5). GSEA employing the DisGeNET database identified 3298; 1387; and 477 significantly enriched records of human diseases linked with 240 genes, 34 genes, and 5444 genes, respectively (Fig. 5; Supplementary Figure S1; Supplementary Table S1; Supplementary Summary S4). Qualitatively distinct patterns noted between top 10 significantly enriched records of human diseases associated with 34 genes versus 5444 genes (Supplementary Figure S1; Supplementary Table S1) were recapitulated by GSEA of 240 genes: all top 10 GSEA records linked with 240 genes were multiple records of malignant neoplasms, while top 10 records linked with GSEA of 5444 genes were neurodevelopmental and substance abuse disorders (Fig. 5; Supplemental Table S1; Supplementary Summary S4). In contrast to common human diseases, GSEA employing the Orphanet Augmented 2021 database identified 199 and 1095 significantly enriched records of human orphan disorders associated with 240 genes and 5444 genes, respectively (Fig. 5; Supplementary Table S1; Supplementary Summary S4).
Particularly intriguing observations were recorded during GSEA of proteomics databases focused on the molecular architecture of a large compendium of endogenous PPI networks. There are 268 and 228 significantly enriched records of subsets of 240 genes identified by GSEA of the Transcription Factors PPI database and the Hub Proteins PPI database, respectively, while there are 121 and 124 significantly enriched record in corresponding databases identified by GSEA of 34 genes (Figs. 5–6; Supplementary Table S1; Supplementary Summary S4-S5). Similarly, there are 606 and 62 significantly enriched records of 240 genes identified by GSEA of the CORUM Protein Complexes database and the SILAC Phosphoproteomics database, respectively, while there are 143 and 22 significantly enriched record in corresponding databases identified by GSEA of 34 genes (Fig. 6; Supplementary Table S1; Supplementary Summary S5). Strikingly, GSEA employing the NURSA Human Endogenous Complexome database identified 1783 significantly enriched records linked with 240 genes, while no significantly enriched traits were recorded for GSEA of either 34 genes or 5444 genes (Fig. 6; Supplementary Table S1; Supplementary Summary S5). These observations indicate that protein products selected from a panel of 240 genes are significantly enriched in PPIs of a marked majority of endogenous human multimolecular complexes recorded in the NURSA Human Endogenous Complexome database. It follows, that a set of 88 human embryo regulatory LTR elements may extend its regulatory impacts on vast networks of human endogenous multimolecular complexes via protein products of 34 down-stream target genes engaged in PPIs with 268 transcription factors and 228 hub proteins. Strikingly, these networks appear impacting PPIs that are significantly enriched in 6584 records of virus-host PPIs documented in Virus-Host PPI P-HIPSTer 2020 database and 2071 records of human rare disorders catalogued in the Rare Diseases AutoRIF Gene Lists database (Supplementary Table S1).
Notably, GSEA of 240 genes employing the OMIM Expanded database of PPIs of human disease-causing genes further validated and expanded this concept by identification of 51 genes, protein products of which appear engaged in PPIs with protein products of disease-causing genes implicated in development of numerous human disorders of an extremely broad spectrum of clinical phenotypes (Fig. 6; Supplementary Table S1; Supplementary Summary S5).
Collectively, present analyses suggest that a common molecular mechanism driving development of human pathologies may be associated with altered expression of proteins encoded by regulatory LTR target genes and down-stream effects on PPI networks engaging protein products of disease-causing genes. Malfunctions of endogenous multimolecular complexes tagged by PPIs of proteins encoded by regulatory LTR target genes may contribute to molecular pathogenesis of an exceedingly large number of human diseases, which exhibit highly diverse spectrum of clinical manifestations including malignancies of different solid organs and hematopoietic system; blood, lymphatic and endocrine system disorders; immunoinflammatory and autoimmune maladies; neurodevelopmental, neuropsychiatric, and neurodegenerative diseases.
Genes representing common regulatory targets of human embryo regulatory LTR elements are segregated into sub-sets of genetic loci coalesced within genomic regulatory pathways of common biological functions.
Notable common features of all identified herein GRNs governed by human embryo regulatory LTR elements are the quantitatively balanced numbers of residing within GRN’s boundaries regulatory LTR elements derived from LTR7, MLT2A1, and MLT2A2 sequences (Table 4). These observations indicate that LTR elements of distinct families may affect genes (including sub-sets of common down-stream target genes) comprising constituents of corresponding GRNs, which then likely to exert biologically discernable effects on corresponding phenotypic traits. If the model is correct, this consonant regulatory framework should be observable within previously defined signal transduction pathways known to affect specific phenotypes.
The validity of this hypothesis was tested by performing GSEA of 709 genes comprising a set of common down-stream regulatory targets of LTR7, MLT2A1, and MLT2A2 elements (Table 1). In these analytical experiments, GSEA-defined significantly enriched sets of genes operating within specific pathways were mapped to sub-sets of their putative up-stream regulatory elements among human embryo regulatory LTRs (Supplementary Table S2; Supplementary Data Set S2). Then each of these sub-sets of regulatory LTRs associated with specific pathways were subjected to GO analyses using the GREAT algorithm. Results of these analyses demonstrate that genetic loci representing common regulatory targets of human embryo regulatory LTRs appear segregated into relatively small sets of LTR/gene panels coalesced within genomic regulatory pathways of common biological functions (Tables 8–10; Supplementary Figures S2-S10; Supplementary Table S2; Supplementary Data Set S2). These GSEA/GREAT analyses-defined sets of genes and associated regulatory LTR elements were designated Genomic Regulatory Pathways (GRPs) because they were identified by GSEA as significantly enriched genetic loci (adjusted p value < 0.05) assigne to corresponding previously defined regulatory pathways.
Table 9
Enrichment of regulatory LTR elements and genes associated with Mammalian Offspring Survival phenotypes within the regulatory LTR-governed GRM of reduced fertility phenotype.
Phenotype
|
Binominal Rank
|
Binominal Raw P-Value
|
Binominal FDR Q-Value
|
Binominal Fold Enrichment
|
Binominal Observed LTR Hits
|
Binominal LTR Set Coverage
|
Hypergeometric Rank
|
Hypergeometric FDR Q-Value
|
Hypergeometric Fold Enrichment
|
Hypergeometric Observed Gene Hits
|
Hypergeometric Total Genes
|
Hypergeometric Gene Set Coverage
|
reduced fertility
|
1
|
0
|
0
|
11.34
|
311
|
1.00
|
1
|
1.08E-62
|
13.04
|
77
|
652
|
0.45
|
postnatal lethality
|
14
|
8.35E-84
|
5.71E-81
|
3.62
|
218
|
0.70
|
20
|
2.46E-15
|
4.01
|
50
|
1378
|
0.29
|
lethality during fetal growth through weaning
|
29
|
7.68E-68
|
2.53E-65
|
2.51
|
248
|
0.80
|
21
|
3.00E-15
|
3.09
|
65
|
2325
|
0.38
|
postnatal lethality, incomplete penetrance
|
48
|
5.59E-49
|
1.11E-46
|
3.83
|
141
|
0.45
|
29
|
5.90E-14
|
5.07
|
37
|
806
|
0.22
|
infertility
|
104
|
7.29E-34
|
6.70E-32
|
3.61
|
109
|
0.35
|
115
|
1.65E-07
|
3.59
|
29
|
893
|
0.17
|
male infertility
|
113
|
2.04E-32
|
1.73E-30
|
4.29
|
89
|
0.29
|
154
|
5.62E-06
|
3.77
|
22
|
644
|
0.13
|
premature death
|
116
|
8.52E-32
|
7.02E-30
|
2.61
|
147
|
0.47
|
74
|
1.05E-08
|
3.04
|
41
|
1491
|
0.24
|
perinatal lethality
|
141
|
3.62E-30
|
2.46E-28
|
2.41
|
155
|
0.50
|
39
|
9.61E-12
|
3.60
|
44
|
1349
|
0.26
|
postnatal lethality, complete penetrance
|
157
|
1.63E-28
|
9.94E-27
|
3.30
|
103
|
0.33
|
128
|
6.13E-07
|
3.97
|
24
|
667
|
0.14
|
preweaning lethality, incomplete penetrance
|
282
|
5.62E-20
|
1.91E-18
|
3.60
|
67
|
0.22
|
268
|
0.000239
|
3.73
|
16
|
474
|
0.09
|
neonatal lethality, incomplete penetrance
|
288
|
1.06E-19
|
3.53E-18
|
3.51
|
68
|
0.22
|
137
|
2.03E-06
|
4.92
|
18
|
404
|
0.11
|
lethality at weaning, complete penetrance
|
310
|
6.32E-19
|
1.95E-17
|
7.88
|
32
|
0.10
|
833
|
0.038637
|
6.50
|
4
|
68
|
0.02
|
neonatal lethality
|
319
|
1.67E-18
|
5.01E-17
|
2.30
|
112
|
0.36
|
58
|
5.92E-10
|
3.87
|
35
|
998
|
0.21
|
perinatal lethality, incomplete penetrance
|
356
|
6.07E-17
|
1.63E-15
|
3.49
|
59
|
0.19
|
185
|
2.14E-05
|
4.95
|
15
|
335
|
0.09
|
neonatal lethality, complete penetrance
|
359
|
7.04E-17
|
1.88E-15
|
2.52
|
90
|
0.29
|
67
|
3.47E-09
|
4.49
|
28
|
689
|
0.16
|
lethality at weaning
|
399
|
1.62E-15
|
3.87E-14
|
5.73
|
33
|
0.11
|
812
|
0.035547
|
5.11
|
5
|
108
|
0.03
|
abnormal cell death
|
436
|
2.42E-14
|
5.31E-13
|
2.07
|
107
|
0.34
|
291
|
0.000436
|
2.27
|
31
|
1510
|
0.18
|
prenatal lethality, incomplete penetrance
|
492
|
6.78E-13
|
1.32E-11
|
2.99
|
54
|
0.17
|
572
|
0.011505
|
3.01
|
12
|
440
|
0.07
|
increased cell death
|
565
|
2.24E-11
|
3.79E-10
|
2.25
|
74
|
0.24
|
375
|
0.001637
|
2.58
|
21
|
898
|
0.12
|
prenatal lethality, complete penetrance
|
926
|
2.07E-07
|
2.14E-06
|
2.38
|
42
|
0.14
|
837
|
0.038723
|
2.78
|
10
|
397
|
0.06
|
female infertility
|
1090
|
2.70E-06
|
2.37E-05
|
2.49
|
32
|
0.10
|
531
|
0.008139
|
3.37
|
11
|
360
|
0.06
|
Table 10
High-confidence LTR-regulated genes comprise a significant majority of 5444 genes and 709 genes defined as cumulative and common sets of putative down-stream regulatory targets for LTR7 & MLT2A1 & MLT2A2 human embryo regulatory LTR loci.
Classification category
|
Number of genes
|
Percent
|
Common set of LTR7 & MLT2A1 & MLT2A2 target genes
|
709
|
100.00
|
LTR5_Hs regulated genes
|
195
|
27.50
|
LTR7Y/B regulated genes
|
202
|
28.49
|
HERVH lncRNA regulated genes
|
253
|
35.68
|
SVA regulated genes
|
96
|
13.54
|
LTR5_Hs/SVA regulated genes
|
158
|
22.28
|
All LTR-regulated common target genes
|
473
|
66.71
|
Cumulative set of LTR7 & MLT2A1 & MLT2A2 target genes
|
5444
|
100.00
|
LTR5_Hs regulated genes
|
1344
|
24.69
|
LTR7Y/B regulated genes
|
1873
|
34.40
|
HERVH lncRNA regulated genes
|
1876
|
34.46
|
SVA regulated genes
|
841
|
15.45
|
LTR5_Hs/SVA regulated genes
|
937
|
17.21
|
All LTR-regulated target genes
|
3659
|
67.21
|
Similar to GRNs, all identified by GSEA/GREAT analyses GRPs contain numerous regulatory LTRs of LTR7, MLT2A1, and MLT2A2 families and harbor hundreds of GO annotated GRMs. However, GRPs appear to represent quantitatively smaller genomic regulatory units compared to 26 GRNs based on the assessment of numbers of significantly enriched constituent GRMs (Tables 3–5; Table 8; Supplementary Table S2; Supplementary Data Sets S1; S2) as well as relatively scarce representations of constituent GRMs within DAG analyses-defined GO-specific hierarchical networks (Figs. 2–3; Supplementary Figures S3-S10; Supplementary Data Sets S1; S2). Various regulatory aspects of development and functions of peripheral and central nervous systems at cell type-specific levels and distinct morphological regions appear to represent a dominant common theme shared by GRPs (Supplementary Figures S3-S10; Supplementary Data Set S2). These observations suggest that GSEA-defined GRPs, similar to GO annotations-defined GRMs and GRNs, may constitute a constellation of genomic regulatory units each of which is aligned to specialized biological pathways united by a functional synergy required to preserve the developmental and operational integrity of nervous system.
It has been noted that a compendium of GSEA/GREAT analyses-defined GRPs comprising 966 human embryo regulatory LTR elements and 255 down-stream target genes (Supplementary Figure S10; Table 8; Supplementary Data Set S2) manifests genomic features quantitatively and qualitatively resembling GRNs, including overlapping subsets of regulatory LTRs and down-stream target genes assigned by GO annotations to 26 GRNs and their constituent GRMs. Intriguing examples of these common GRMs of potential evolutionary significance include sets of regulatory LTRs/down-stream target genes annotated to significantly enriched records of mammalian offspring survival (MOS) phenotypes. GO annotations employing the Mouse Phenotype Single KO database identified the following significantly-enriched records of relevance to MOS phenotypes:
1. MP:0002082; postnatal lethality; 247 LTRs; 45 genes; Binominal FDR Q-value = 1.59E-17;
2. MP:0002081; perinatal lethality; 251 LTRs; 41 genes; Binominal FDR Q-value = 7.30E-15;
3. MP:0011089; perinatal lethality, complete penetrance; 91 LTRs; 14 genes; Binominal FDR Q-value = 6.84E-12;
4. MP:0011090; perinatal lethality, incomplete penetrance; 88 LTRs; 13 genes; Binominal FDR Q-value = 5.84E-11;
5. MP:0011086; postnatal lethality, incomplete penetrance; 149 LTRs; 30 genes; Binominal FDR Q-value = 7.92E-10;
6. MP:0011083; lethality at weaning, complete penetrance; 33 LTRs; 5 genes; Binominal FDR Q-value = 1.43E-08;
7. MP:0011085; postnatal lethality, complete penetrance; 106 LTRs; 20 genes; Binominal FDR Q-value = 2.25E-07;
8. MP:0011110; preweaning lethality, incomplete penetrance; 94 LTRs; 14 genes; Binominal FDR Q-value = 1.16E-06;
9. MP:0001926; female infertility; 65 LTRs; 11 genes; Binominal FDR Q-value = 1.36E-06;
Collectively, these observations indicate that two critically important classes of genetic loci appear co-segregated within identified herein human embryo regulatory LTR-governed genomic regulatory units of various architecture: a). Genes essential for the integrity of nervous system development and functions; b). Genes essential for mammalian offspring survival. It seems likely that this striking co-segregation may reflect co-regulation of these two classes of genes by common sets of regulatory LTR elements. Since knockout of many genes within these regulatory LTR-governed GRNs causes offspring lethality, it is logical to expect that their sustained activity would be essential for offspring survival. Sustained activity of MOS genes within regulatory LTR-governed GRNs could be achieved either by relieving the repressor actions of regulatory LTRs or maintaining the activator effects of regulatory LTRs. Alignments of this genomic regulatory architecture of human embryo regulatory LTR-governed GRNs with control of genes essential for development and functions of nervous system may have contributed to acceleration of central nervous system’s dramatic changes during primate evolution.
Enrichment patterns of genomic loci annotated to mammalian phenotypes of reduced fertility and infertility within GRNs governed by human embryo regulatory LTR elements.
One of the potential biological consequences of essential MOS genes being under control of human embryo regulatory LTR elements is potential effects of this genomic regulatory architecture on species’ fertility phenotypes. In agreement with this hypothesis, it has been observed that GO-defined GRMs annotated to reduced fertility and/or infertility phenotypes are consistently enriched within 18 of 26 GRNs governed by human embryo regulatory LTR elements (Supplementary Table S3). Notably, all GRNs annotated to phenotypic traits closely related to development and functions of nervous system harbor significantly enriched GRMs of reduced fertility and/or infertility phenotypes (Supplementary Table S3). In contrast, GRNs that are more distantly related to or not directly associated with nervous system development and functions do not harbor GRMs of reduced fertility and/or infertility phenotypes among significantly enriched GRMs. These findings suggest that during primate evolution functionally essential fertility genes (that is, genes knockout of which causes reduced fertility and/or infertility phenotypes) and genes essential for development and functions of nervous system were placed under regulatory control of human embryo regulatory LTR elements. It follows that biologically meaningful repressor effects on both classes of genes during ontogenesis would cause defects in development and functions of nervous system as well as would result in reduced fertility and/or infertility phenotypes, thus directly affecting the survivability of the offspring lineage and evolutionary fitness of species.
It was of interest to determine whether GO analyses of human embryo regulatory LTR elements would reveal the potential genomic regulatory connectivity between functionally essential fertility genes and genes essential for mammalian offspring survival. To this end, GO annotations records of the regulatory LTR-governed GRM of the reduced fertility phenotype (MP:0001921) were analyzed to identify the significantly enriched records of regulatory LTR elements and genes associated with Mammalian Offspring Survival (MOS) phenotypes (Table 9). It has been observed that one of the important features of the regulatory LTR-governed GRM of the reduced fertility phenotype is the significant enrichment of regulatory LTR elements and genes annotated by GO analyses to multiple records of MOS phenotypes (Table 9). These observations indicate that abrogated functions of genes essential for mammalian offspring survival (that is, genes knockout of which causes the offspring lethality phenotypes) may represent one of key genetic elements contributing to reduced fertility and/or infertility phenotypes. Based on reported herein observations, it has been concluded that during primate evolution functionally essential fertility genes, genes essential for offspring survival, and genes essential for development and functions of nervous system may have been placed under regulatory control of human embryo regulatory LTR elements. This implied genomic regulatory connectivity may constitute a critical element of evolutionary concordance linking the requirement of balanced and coherent development and functions of nervous system to the survivability of the offspring lineage and evolutionary fitness of primate species.
High-confidence down-stream target genes of HERVs constitute a significant majority of candidate regulatory targets of human embryo regulatory LTR elements.
Reported herein inferences of phenotypic impacts of HERVs was made based on the analyses of potential regulatory impacts of human embryo regulatory LTR elements on down-stream target genes without taking into account direct experimental evidence of gene expression changes in human cells following targeted genetic manipulations of HERVs. To address this limitation, genes expression of which is significantly altered in human cells subjected to either shRNA-mediated interference or CRISPR-guided epigenetic silencing directed at defined genetic loci of specific HERV families were identified among candidate down-stream regulatory targets of human embryo regulatory LTR elements. Then proportions of these high-confidence down-stream regulatory targets of regulatory LTRs with experimentally documented gene expression changes were calculated (Table 10).
Results of these analyses demonstrate that selective targeting in human cells of defined regulatory sequences encoded by HERVs caused significant expression changes of sub-sets of genes among candidate down-stream regulatory targets of human embryo regulatory LTRs (Table 10; Supplementary Table S2). These significant changes of gene expression were documented in multiple independent experiments employing various human cell lines, which were genetically engineered using either shRNA interference or CRISPR-guided epigenetic silencing tools tailored for selective targeting of distinct regulatory elements of HERVs (including regulatory LTRs and long non-coding RNAs, lncRNAs). While identified proportions of genes comprising high-confidence down-stream regulatory targets varied from 14–36% in different experimental settings, cumulative sets of identified herein high-confidence down-stream regulatory targets constitute 67% of candidate regulatory LTR-regulated genes (Table 10; Supplementary Table S2). Based on these observations, it has been concluded that high-confidence regulatory LTR-regulated genes defined by expression profiling experiments comprise a significant majority of 5444 genes and 709 genes, which represent, respectively, cumulative and common sets of identified herein putative down-stream regulatory targets of LTR7; MLT2A1; and MLT2A2 human embryo regulatory LTR loci.
It was of interest to determine whether the inferences of potential global phenotypic impacts of human embryo regulatory LTR elements on human development and pathophysiology could be reproduced by GSEA of high-confidence down-stream regulatory target genes of regulatory LTRs. To address this question, significantly enriched records identified by GSEA of 709 genes and 473 genes during analyses of corresponding genomics and proteomics databases were retrieved and catalogued at the statistical significance threshold of adjusted P value (FDR q value) < 0.05 (Table 11; Supplementary Figure S2; Supplementary Table S2). Side by side comparisons of the results demonstrate that GSEA of 473 high-confidence down-stream regulatory target genes of regulatory LTRs essentially recapitulated the quantitative and qualitative GSEA patterns of 703 genes representing a common set of candidate down-stream regulatory targets of regulatory LTRs (Table 11; Supplementary Figure S2; Supplementary Table S2). Therefore, the inferences of potential global phenotypic impacts of human embryo regulatory LTR elements on human development and pathophysiology that have been made based on GO-guided annotations and GSEA of candidate down-stream target genes are recapitulated by GSEA of gene expression profiling experiments-validated high-confidence down-stream regulatory target genes of HERVs.
Table 11
GSEA of phenotypic impacts of common sets of LTR7, MLT2A1, and MLT2A2 down-stream target genes defined by proximity placement analyses (709 genes) and by genetic/epigenetic silencing of regulatory LTR loci (473 genes).
Classification category
|
Proximity placement defined genes*
|
Genetic/epigenetic silencing defined genes*
|
Genomics & proteomics databases
|
709 genes
|
473 genes
|
DisGeNET database of human diseases
|
52
|
72
|
Orphanet Augmented 2021
|
697
|
732
|
Rare Diseases AutoRIF ARCHS4 Predictions
|
63
|
65
|
Rare Diseases GeneRIF ARCHS4 Predictions
|
41
|
43
|
Rare Diseases GeneRIF Gene Lists
|
8
|
13
|
Rare Diseases AutoRIF Gene Lists
|
16
|
25
|
WikiPathway 2021 Human
|
5
|
11
|
KEGG 2021 Human
|
2
|
1
|
BioPlanet 2019
|
6
|
6
|
GO Cellular Component 2021
|
17
|
18
|
GO Molecular Function 2021
|
17
|
13
|
GO Biological Process 2021
|
67
|
55
|
MGI Mammalian Phenotype Level 4 2021
|
28
|
57
|
PanglaoDB Augmented 2021
|
66
|
72
|
CellMarker Augmented 2021
|
159
|
187
|
Azimuth Cell Types 2021
|
75
|
89
|
GTEx Tissue Expression Up
|
431
|
555
|
GTEx Tissue Expression Down
|
147
|
221
|
Human Gene Atlas
|
5
|
5
|
ARCHS4 Human Tissues
|
43
|
52
|
Allen Brain Atlas up
|
336
|
312
|
Allen Brain Atlas down
|
165
|
159
|
Disease Perturbations from GEO down
|
55
|
85
|
Disease Perturbations from GEO up
|
22
|
31
|
RNAseq Automatic GEO Signatures Human Up
|
507
|
821
|
RNAseq Automatic GEO Signatures Human Down
|
541
|
856
|
Transcription Factor PPIs
|
1
|
1
|
Reactome 2022
|
13
|
9
|
ARCHS4 Kinases Co-expression
|
61
|
68
|
Panther 2016
|
2
|
2
|
NCI-Nature 2016
|
1
|
4
|
HuBMAP ASCTplusB augmented 2022
|
94
|
91
|
ARCHS4 TF Co-expression
|
204
|
190
|
ProteomicsDB 2020
|
0
|
0
|
Legend: * Numbers of significantly enriched records identified by GSEA of 709 genes and 473 genes during analyses of corresponding genomics and proteomics databases to define entries at the statistical significance threshold of adjusted P value (FDR q value) < 0.05. |
Human embryo regulatory LTRs derived from distinct families of pan-primate endogenous retroviruses and human-specific genomic regulatory sequences (HSRS) share common sets of candidate down-stream target genes.
One of important conclusions that could be derived from reported herein observations is that human embryo regulatory LTR elements could have contributed to primate evolution by driving the emergence of human-specific phenotypic traits. To test this hypothesis and gauge potential impacts of regulatory LTRs on emergence of human-specific phenotypic traits, identified and functionally characterized in this study candidate down-stream regulatory targets of human embryo regulatory LTR elements were cross-referenced with previously reported down-stream target genes of 110,339 human-specific genomic regulatory sequences (HSRS), which are fixed in human population and creation of which was attributed to several distinct mechanisms (Glinsky, 2015–2022; Methods). For each binary category of distinct regulatory loci and corresponding down-stream regulatory targets, common sets of down-stream target genes were identified and evaluated based on two quantitative metrics: a) enrichment of observed numbers of common target genes compared to expected numbers based on the random target distribution model; b) evaluations of relative abundance profiles of common sets of down-stream target genes calculated as the percentage of all down-stream target genes. It has been surmised that identification of significantly enriched common sets of genes independently defined as putative down-stream regulatory targets of HSRS and human embryo regulatory LTRs may reveal common phenotypic traits affected by different classes of genomic regulatory elements.
Consistent with the idea of overlapping sets of down-stream targets for different classes of genomic regulatory elements, genes identified as down-stream regulatory targets of MLT2A2 and MLT2A1 loci are significantly enriched among LTR7 target genes, while genes identified as down-stream regulatory targets of LTR7 loci are significantly enriched among MLT2A2 and MLT2A1 target genes (Fig. 7; Supplementary Table S4). Furthermore, genome-wide connectivity profiles of human embryo regulatory LTRs and putative target genes are significantly recapitulated within genomic regulatory networks presumably governed by 72,349 human-specific mutations of regulatory DNA sequences, including 17,777 fixed human-specific insertions and deletions; 35,074 fixed human-specific neuro-regulatory single nucleotide mutations shared with archaic humans; and 7,236 fixed human-specific regulatory regions devoided of archaic humans’ DNA (Fig. 7; Supplementary Table S4).
In several instances it was possible to dissect common patterns of down-stream regulatory targets’ enrichment of genes linked with different families of human embryo regulatory LTRs vis-a-vis defined sets of human-specific sequences previously identified as components of different GRNs (Fig. 7; Table 12; Supplementary Table S4). These set of analyses included the pairwise comparisons of the relative enrichment profiles of genes comprising down-stream targets of Naïve & Primed hESC functional enhancers and genes linked with 1619 human-specific Naïve & Primed hESC functional enhancers (Barakat et al., 2018; Glinsky and Barakat, 2019); genes comprising down-stream targets of transposable elements (TE) loci expressed in dorsolateral prefrontal cortex (DLPFC) of the adult human brain and genes linked with 4690 human-specific TE loci expressed in human adult DLPFC (Guffanti et al., 2018); genes identified as down-stream targets of a highly-diverse set of 59,732 HSRS and genes associated with 12,262 created de novo HSRS (Glinsky, 2020). Notably, in all instances genes comprising down-stream targets of bona fide human-specific components of genomic regulatory networks appear significantly enriched among genes defined as down-stream targets of different families of human embryo regulatory LTR elements (Fig. 7; Table 12; Supplementary Table S4).
Table 12
Enrichment of genes linked to distinct families of Human-Specific Regulatory Sequences (HSRS) among putative down-stream target genes of distinct families of human embryo regulatory LTR elements.
Classification category
|
Observed
|
Expected
|
Enrichment/Depletion
|
P value*
|
Genes associated with LTR7 regulatory elements
|
2957
|
2957
|
1.00
|
NA
|
Genes linked with Naïve & Primed hESC functional enhancers
|
1703
|
1822
|
0.93
|
0.001763
|
Genes linked with 1619 human-specific Naïve & Primed hESC functional enhancers
|
522
|
203
|
2.57
|
1.67E-37
|
Genes linked with TE loci expressed in human adult DLPFC
|
2262
|
2094
|
1.08
|
8.085E-07
|
Genes linked with 4690 human-specific TE loci expressed in human adult DLPFC
|
1328
|
661
|
2.01
|
4.04E-76
|
Genes associated with 59,732 HSRS
|
2615
|
2080
|
1.26
|
7.732E-68
|
Genes associated with 12,262 created de novo HSRS
|
1390
|
685
|
2.03
|
3.274E-83
|
Genes associated with HSRS (except genes linked with de novo HSRS)
|
1225
|
1395
|
0.88
|
9.634E-06
|
Genes linked with MLT2A1 & MLT2A2
|
3971
|
3971
|
1.00
|
NA
|
Genes linked with Naïve & Primed hESC functional enhancers
|
2121
|
2447
|
0.87
|
1.54E-13
|
Genes linked with 1619 human-specific Naïve & Primed hESC functional enhancers
|
655
|
272
|
2.40
|
8.30E-41
|
Genes linked with TE loci expressed in human adult DLPFC
|
2059
|
2811
|
0.73
|
1.30E-67
|
Genes linked with 4690 human-specific TE loci expressed in human adult DLPFC
|
1903
|
887
|
2.15
|
5.15E-117
|
Genes associated with 59,732 HSRS
|
3525
|
2794
|
1.26
|
1.66E-94
|
Genes associated with 12,262 created de novo HSRS
|
1910
|
920
|
2.08
|
1.12E-120
|
Genes associated with HSRS (except genes linked with de novo HSRS)
|
1615
|
1874
|
0.86
|
5.35E-09
|
Genes linked with LTR7 & MLT2A1 & MLT2A2
|
5444
|
5444
|
1.00
|
NA
|
Genes linked with Naïve & Primed hESC functional enhancers
|
3040
|
3354
|
0.91
|
1.10E-09
|
Genes linked with 1619 human-specific Naïve & Primed hESC functional enhancers
|
814
|
373
|
2.18
|
3.26E-43
|
Genes linked with TE loci expressed in human adult DLPFC
|
4147
|
3854
|
1.08
|
2.24E-10
|
Genes linked with 4690 human-specific TE loci expressed in human adult DLPFC
|
2306
|
1216
|
1.90
|
7.80E-119
|
Genes associated with 59,732 HSRS
|
4782
|
3830
|
1.25
|
5.12E-114
|
Genes associated with 12,262 created de novo HSRS
|
2361
|
1261
|
1.87
|
5.66E-112
|
Genes associated with HSRS (except genes linked with de novo HSRS)
|
2421
|
2569
|
0.94
|
0.0046903
|
Legend: *, 2-tail p values were estimated using the Fisher's exact test. |
Identified herein consistent patterns of significant enrichment of common sets of genes comprising down-stream regulatory targets of HSRS and human embryo regulatory LTRs seem related to the 3D topological domains’ architecture of linear chromatin fibers as well as chromatin state signatures. This notion is supported by observations of significant enrichments of genes associated with rapidly evolving in humans topologically associating domains (TADs) among genes comprising down-stream regulatory targets of human embryo regulatory LTRs (Fig. 7; Supplementary Table S4). In a similar fashion, evaluation of genes associated with dynamic TADs, which are TADs undergoing marked changes of their genomic boundaries during differentiation of human cells, documented their significant enrichment among down-stream target genes of human embryo regulatory LTR elements (Fig. 7; Supplementary Table S4). It has been observed that enrichment score values of these categories of common down-stream target genes appeared placed consistently among top-scoring significantly-enriched gene sets within panels of down-stream regulatory targets of distinct families of human embryo regulatory LTRs.
Genes identified herein as candidate down-stream regulatory targets of distinct families of human embryo regulatory LTR elements appear related to a significant degree to down-stream target genes of DNA sequences defined as genomic regulatory elements based on their chromatin state’s signatures, namely 8,297 fixed human-specific regulatory regions (Marnetto et al., 2014) and 7,598 hESC developmental enhancers (Rada-Iglesias et al., 2011). Results of these analyses documented significant enrichment of genes comprising down-stream targets of chromatin state-defined regulatory DNA sequences, revealing genome-wide connectivity components of human embryo regulatory LTR elements and their putative target genes that are recapitulated, in part, within genomic regulatory networks governed by 8,297 fixed human-specific regulatory regions and 7,598 hESC developmental enhancers defined by chromatin state signatures. These findings imply that subsets of genes comprising down-stream targets of regulatory LTRs represent down-stream target genes regulated by hESC developmental enhancers and/or fixed human-specific regulatory regions implicated in control of differentiation phenotypes of human cells.
One of biological processes affecting the integrity of genomic DNA sequences during evolution is colonization and functional/structural remodeling of primates’ genomes by mobile genetic elements (MGE), including various families of TE and HERVs, most of which lost their mobility and have been co-opted to perform a variety of genomic regulatory functions. L1, Alu and SVA elements are the only transposable elements that have unequivocally been shown to be currently active in humans, as shown by de novo insertions that are responsible for genetic disorders (Cordaux and Batzer, 2009). Inevitable mutational losses of essential regulatory features of transposon insertions dictate the necessity of continuous retrotransposition during evolution. Latest examples of structural/functional exaptation of MGEs likely reflecting relatively recent events of genomic evolution of present day Modern Humans have been mapped based on identification and characterization of fixed insertions of MGEs that were most recently active in human genome (Autio et al., 2021). It has been observed that down-stream target genes of 1937 genomic regulatory elements constituting fixed DNA insertions of most recently active MGEs are significantly enriched among down-stream target genes of human embryo regulatory LTRs of distinct HERV families. Overall, a majority of genes (536 of 926 genes; 58%) comprising down-stream regulatory targets of fixed MGE loci most recently active in human genome appears selected from genes identified in this study as putative down-stream targets of distinct families of human embryo regulatory LTR elements. These findings indicate that GRNs governed by different families of human embryo regulatory LTRs and insertions of MGEs that were most recently active in human genome share a common set of down-stream regulatory targets comprising 536 genes.
Comparative analyses of genome-wide connectivity maps of genomic coordinates of human embryo regulatory LTRs, human-specific regulatory elements, and their candidate down-stream target genes imply concordant patterns of their inferred phenotypic impacts and suggest a genomic regulatory mechanism contributing to emergence of human-specific traits during primate evolution. Considering evolutionary timelines of introduction and expansion of distinct HERVs’ families within the primate lineage vis-à-vis human-specific genomic regulatory sequences, these observations are consistent with the idea that human embryo regulatory LTR elements have made important contribution to creation of genomic fingerprints guiding the evolution of regulatory DNA features facilitating the emergence of human-specific phenotypic traits.
Down-stream target genes of human embryo regulatory LTR elements are significantly enriched among genes manifesting species-specific expression mapping bias in Human-Chimpanzee hybrids of induced pluripotent stem cells and brain organoids.
Dissection of genomic and molecular anatomy of regulatory mechanisms defining development and manifestations of human-specific phenotypic traits remains a highly significant challenge despite recent significant progress in explorations of species-specific developmental features using primate-derived induced pluripotent stem cells (iPSC) and brain organoids, which were particularly useful for identification of species-specific differences of gene expression (Camp et al., 2015; Gallego Romero. et al., 2015; Glinsky, 2020; Kanton, et al., 2019; Kronenberg et al., 2017; Lancaster et al., 2013; Mora-Bermudez et al., 2016; Muchnik et al., 2019; Otani et al., 2016; Pasca et al., 2015; Pasca, 2018; Pollen et al., 2019; Prescott et al., 2015; Qian et al., 2019; Sloan et al., 2017). Most recently, human-chimpanzee hybrid iPSC (hyiPSC) has been developed and utilized to study species-specific gene expression differences in hyiPSC and hyiPS cells differentiated into region-specific brain organoids termed hybrid cortical spheroids, hyCS (Agoglia et al., 2021; Gokhman et al., 2021). Stable tetraploid hybrids of human and chimpanzee cells enabled to control for non-genetic confounding factors such as different timing of developmental dynamics and differentiation processes, micro-environmental fluctuations, differences in cellular compositions, experimental batch effects, which are limited the utility of other experimental approaches aiming to disentangle species-specific divergence of gene expression. Significantly, cross-species tetraploid hybrid cells allowed to unambiguously identify species-specific gene expression differences caused specifically by genomic cis-regulatory divergence between species, which include promoters and enhancers affecting only down-stream target genes residing on the same species-specific DNA molecules (Agoglia et al., 2021; Gokhman et al., 2021). Therefore, it was logical to test the hypothesis that human embryo regulatory LTR elements may contribute to emergence of human-specific phenotypic traits by evaluating their potential regulatory effects on genes manifesting species-specific expression differences in human-chimpanzee hybrid cells.
To this end, individual binary categories of 5444 down-stream regulatory targets of human embryo regulatory LTRs with three different sets of genes manifesting species-specific gene expression biases in human-chimpanzee hybrid cells were selected, then common (overlapping) sets of genes were identified and significance of enrichment was estimated of observed numbers of common genes compared to corresponding expected numbers based on the random distribution model (Fig. 8; Table 13). Notably, consistent patterns of significant enrichment of human embryo regulatory LTR target genes among three different categories of genes manifesting species-specific biases of gene expression in human-chimpanzee hybrid cells (Fig. 8; Table 13). There are 95 genes putatively regulated by human embryo LTRs among 374 genes with species-specific expression mapping bias in hyiPSC of human-chimpanzee hybrids (Table 13; Fig. 8), which constitute 2.2-fold enrichment compared to expected numbers of common genes (p = 9.58E-14). Similarly, there was observed the 1.9-fold (p = 5.06E-07) enrichment of human embryo LTRs’ down-stream target genes among genes exhibiting species-specific expression mapping bias in human-chimpanzee hybrids of Cortical Spheroid (Brain Organoids), hyCS (Table 13; Fig. 8). Finally, a total of 56 down-stream target genes of human embryo regulatory LTRs constitute 30.6% of genes manifesting species allele-specific expression profiles in human-chimpanzee hybrid cells (Table 13), which represents a 2.7-fold enrichment (p = 4.36E-12) compared to expected by chance alone numbers. Examination of expression profiles of genes with species-specific expression bias in human-chimpanzee hybrids revealed the apparently discordant patterns of their expression in both hyiPSC (Fig. 8; panels A and B) and hyCS brain organoids’ settings (Fig. 8; panels C and D) supporting the idea that species-specific differences in gene expression may contribute to phenotypic divergence of Humans and Chimpanzee in regulation and manifestation of stemness phenotype and cortical differentiation.
Table 13
Enrichment of human embryo regulatory LTR down-stream target genes among genes with species-specific expression mapping bias in hybrids of Human and Chimpanzee cells.
Classification category
|
Number of genes
|
Percent
|
Genes with expression mapping bias in hybrid induced Pluripotent Stem (iPS) Cells of Human and Chimpanzee
|
374
|
100
|
Genes regulated by human embryo regulatory LTR elements
|
95
|
25.40
|
Expected by chance
|
44
|
|
Enrichment
|
2.2
|
|
P value*
|
9.58E-14
|
|
Genes with expression mapping bias in hybrid Cortical Spheroids (Brain Organoids) of Human and Chimpanzee
|
291
|
100
|
Genes regulated by human embryo regulatory LTR elements
|
63
|
21.65
|
Expected by chance
|
34
|
|
Enrichment
|
1.9
|
|
P value*
|
5.06E-07
|
|
Genes manifesting allele-specific expression in hybrids of Human and Chimpanzee cells
|
183
|
100
|
Genes regulated by human embryo regulatory LTR elements
|
56
|
30.60
|
Expected by chance
|
21
|
|
Enrichment
|
2.7
|
|
P value*
|
4.36E-12
|
|
Legend: * p values were estimated using the hypergeometric distribution test. |
Overall, these analyses revealed that a significant fraction of genes with species-specific expression bias in human-chimpanzee hybrid cells represents down-stream target genes of human embryo regulatory LTRs (177 of 676 genes; 26%), indicating that regulatory effects of regulatory LTR elements may have affected species segregation during the evolution of Great Apes by contributing to phenotypic divergence of Modern Humans. It was of interest to determine what morphological structures and phenotypic traits were preferred targets of regulatory LTR elements during Modern Human divergence during primate evolution. This question was addressed by performing GSEA of identified herein 177 genes (Supplementary Table S5) across multiple genomic database (Methods). GSEA of 177 genes employing the ARCHS4 Human Tissues database identified 23 records with significant enrichment of human embryo regulatory LTRs’ target genes (adjusted p value < 0.05), among which the Dentate Granule Cells were the top-scoring term (adjusted p value = 3.62E-09). A total of 53 genes linked with 223 human embryo regulatory LTRs were identified among genes constituting genetic markers of the Dentate Granule Cells (Fig. 8E). In addition to the Dentate Granule Cells, prefrontal cortex (52 genes), cingulate gyrus (51 genes), cerebellum (48 genes), and cerebral cortex (48 genes) were identified among the top 5 scoring records (Fig. 8E).
To gain insights into the broader developmental context of biological functions of genes associated with 223 human embryo regulatory LTR elements, the DAG analyses were carried out employing the GREAT algorithm (Methods). Notably, GO annotations of the Mouse Single KO database generated local DAG which identified abnormal synaptic transmission (MP:0003635; linked with 66 LTRs and 17 genes; Binominal FDR Q-value = 1.49E-15); abnormal learning/memory/conditioning (MP:0002063) and abnormal cognition (MP:0014114; both terms linked with 49 LTRs and 14 genes; Binominal FDR Q-value = 7.98E-08); abnormal excitatory postsynaptic currents (MP:0002910; 19 LTRs linked with 6 genes; Binominal FDR Q-value = 2.08E-06); abnormal synaptic vesicle morphology (MP:0004769; 11 LTRs linked with 5 genes; Binominal FDR Q-value = 5.37E-06); and abnormal CNS synaptic transmission (MP:0002206; linked with 65 LTRs and 16 genes; Binominal FDR Q-value = 3.07E-17) among 250 significantly enriched functional and developmental anomalies caused by knockout of individual genes associated with 223 human embryo regulatory LTRs (Fig. 8; panels F and G).
Taken together, reported observations indicate that genes manifesting species-specific expression bias in human-chimpanzee hybrid cells and linked by putative regulatory associations with human embryo regulatory LTR elements are implicated in higher cognitive functions of learning and memory; biological processes of CNS synaptic transmission, excitatory postsynaptic currents, and neuroactive ligands-receptors interactions; gene expression markers of numerous morphological components of human brain, including the Dentate Granule Cells, prefrontal cortex, cingulate gyrus, cerebellum, cerebral cortex, superior frontal gyrus, dorsal striatum, motor and sensory neurons, midbrain, oligodendrocytes, and neuronal epithelium (Table 14; Fig. 8; Supplementary Table S5). Additionally, this family of genes represents gene expression markers of adult human somatic tissues (small intestine; testis; alpha cells; retina) and morphological elements of human embryogenesis namely fetal brain cortex, oocyte, and zygote (Table 14; Supplementary Table S5).
Table 14
Summary of significantly enriched phenotypic traits and morphological elements associated with 177 down-stream target genes of human embryo regulatory LTRs manifesting species-specific expression bias in human-chimpanzee hybrids.
KEGG 2021
Human Database
|
Human embryo regulatory LTR target genes
|
Adjusted P-value
|
Genes manifesting species-specific expression bias
|
Adjusted P-value
|
Percent, LTR regulated
|
Relative enrichment*
|
Neuroactive ligand-receptor interactions
|
14
|
4.50E-04
|
24
|
0.116711
|
58
|
2.2
|
HuBMAP ASCTplusB augmented 2022 Database
|
Human embryo regulatory LTR target genes
|
Adjusted P-value
|
Genes manifesting species-specific expression bias
|
Adjusted P-value
|
Percent, LTR regulated
|
Relative enrichment*
|
Inh L1 SST DEFB108B - Brain
|
8
|
8.29E-04
|
8
|
0.387543
|
100
|
3.8
|
Inh L1-2 VIP PTGER3 - Brain
|
8
|
8.29E-04
|
9
|
0.207386
|
89
|
3.4
|
Inh L1-3 SST FAM20A - Brain
|
7
|
0.004064
|
11
|
0.129625
|
64
|
2.4
|
Inh L1-2 PVALB CDK20 - Brain
|
7
|
0.004064
|
13
|
0.028948
|
54
|
2.1
|
Inh L1-2 VIP SCML4 - Brain
|
6
|
0.02756
|
10
|
0.129625
|
60
|
2.3
|
ARCHS4 Human Tissues Database
|
Human embryo regulatory LTR target genes
|
Adjusted P-value
|
Genes manifesting species-specific expression bias
|
Adjusted P-value
|
Percent, LTR regulated
|
Relative enrichment*
|
DENTATE GRANULE CELLS
|
53
|
3.62E-09
|
119
|
6.25E-05
|
45
|
1.7
|
PREFRONTAL CORTEX
|
52
|
6.03E-09
|
118
|
7.90E-05
|
44
|
1.7
|
CINGULATE GYRUS
|
51
|
1.30E-08
|
109
|
0.001818
|
47
|
1.8
|
CEREBELLUM
|
48
|
1.40E-07
|
98
|
0.062407
|
49
|
1.9
|
CEREBRAL CORTEX
|
48
|
1.40E-07
|
116
|
1.74E-04
|
41
|
1.6
|
SPINAL CORD
|
48
|
1.40E-07
|
115
|
2.04E-04
|
42
|
1.6
|
SPINAL CORD (BULK)
|
48
|
1.40E-07
|
115
|
2.04E-04
|
42
|
1.6
|
SUPERIOR FRONTAL GYRUS
|
48
|
1.40E-07
|
136
|
5.62E-09
|
35
|
1.4
|
BRAIN (BULK)
|
47
|
3.62E-07
|
125
|
3.29E-06
|
38
|
1.4
|
DORSAL STRIATUM
|
45
|
2.51E-06
|
112
|
7.31E-04
|
40
|
1.5
|
MOTOR NEURON
|
41
|
9.55E-05
|
111
|
9.16E-04
|
37
|
1.4
|
MIDBRAIN
|
38
|
0.001047
|
97
|
0.076015
|
39
|
1.5
|
SMALL INTESTINE (BULK TISSUE)
|
37
|
0.002074
|
76
|
0.999994
|
49
|
1.9
|
FETAL BRAIN
|
36
|
0.003502
|
109
|
0.001818
|
33
|
1.3
|
OLIGODENDROCYTE
|
36
|
0.003502
|
100
|
0.039555
|
36
|
1.4
|
TESTIS (BULK TISSUE)
|
36
|
0.003502
|
111
|
9.16E-04
|
32
|
1.2
|
ALPHA CELLS
|
35
|
0.005636
|
101
|
0.032484
|
35
|
1.3
|
FETAL BRAIN CORTEX
|
35
|
0.005636
|
96
|
0.092408
|
36
|
1.4
|
RETINA
|
35
|
0.005636
|
99
|
0.05124
|
35
|
1.4
|
SENSORY NEURON
|
35
|
0.005636
|
92
|
0.229934
|
38
|
1.5
|
NEURONAL EPITHELIUM
|
33
|
0.018769
|
79
|
0.999994
|
42
|
1.6
|
OOCYTE
|
33
|
0.018769
|
100
|
0.039555
|
33
|
1.3
|
HUMAN ZYGOTE
|
32
|
0.032657
|
91
|
0.268966
|
35
|
1.4
|
SPERM
|
25
|
0.514091
|
102
|
0.024863
|
25
|
0.9
|
Legend: *Relative enrichment values were estimated by dividing observed values by expected values in corresponding category based on 26% representation of all regulatory LTR down-stream target genes among all genes manifesting species-specific expression bias in human-chimpanzee hybrids. |
The hypothesis that down-stream target genes of human embryo regulatory LTRs may affect human-specific gene expression networks operating in adult human brain was further validated by observations that human embryo regulatory LTRs’ target genes are significantly enriched among genes expression of which demarcates human-specific features of spatial gene expression profiles delineated by comparative analyses of eight brain regions of humans, chimpanzees, gorillas, a gibbon, and macaques (Xu et al., 2018). Overall, a significant fraction of genes comprising down-stream target of human embryo regulatory LTRs represents genes implicated in differential transcriptomic patterns of eight regions of primates’ brains (1968 of 5444 genes; 36.2%; p = 1.03604E-20; hypergeometric distribution test). It has been observed that human embryo regulatory LTRs’ target genes constitute 35.2% of genes manifesting human-specific expression differences in examined eight brain regions in contrast to 25.4% genes exhibiting chimpanzee-specific expression differences (p = 0.026; 2-tail Fisher’s exact test). Furthermore, detailed analyses of expression profiles of genes assigned to 26 distinct gene expression modules of primates’ brain identified 3 modules (modules 22; 16; and 15; Fig. 8H) that manifest most significant relative enrichment of human embryo regulatory LTR’s target genes in contrast to 7 modules (modules 4; 7–11; 18) exhibiting most significant relative depletion (Fig. 8H). Notably, two of three modules with most significant relative enrichment (human-specific module 22 and module 16) of human embryo regulatory LTRs’ down-stream target genes are known to contain the disproportionally high numbers of genes with hippocampus-specific expression (Xu et al., 2018). These findings indicate that hippocampus-specific genes may constitute one of preferred regulatory targets of human embryo regulatory LTR elements.
Collectively, reported herein observations support the hypothesis that genes comprising putative down-stream regulatory targets of human embryo regulatory LTRs contributed to phenotypic divergence of Modern Humans during the primate evolution and may have impacted the emergence of uniquely humans’ phenotypic traits. Identification of hippocampus and more specifically Dentate Granule Cells among significant morphological targets of preference of regulatory effects of human embryo regulatory LTR elements may assist in precise experimental definition of important morphological structures of human brain having evolutionary shaped critical impacts on human-specific features of higher cognitive functions.