Role of Transposable Elements in behavioral traits: insights from six genetic isolates from North-Eastern Italy

doi:10.21203/rs.3.rs-3985238/v1

Download PDF

Research Article

Role of Transposable Elements in behavioral traits: insights from six genetic isolates from North-Eastern Italy

https://doi.org/10.21203/rs.3.rs-3985238/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background

A significant fraction of mammalian genomes is derived from transposable element (TE) sequences, constituting about half of the human genome, in which retrotransposons such as Alu, LINE-1 and SVA are particularly represented and some of them also have functional roles. Germline transposition of these elements generates polymorphisms between individuals and may be used to study association with phenotypes, inter-individual differences and natural selection. Italy presents an increased number of isolated villages and subpopulations when compared to other European groups, and these isolates provide a desirable study subject to understand the genetic variability of the Italian peninsula. Therefore, we focused on studying the association between polymorphic TEs, behavioral traits (tobacco use and alcohol consumption) and Body Mass Index (BMI) variations, which could lead to an increased risk of developing addiction-related or metabolic diseases, such as tobacco use disorder, alcoholism and obesity.

Results

We identified 12,709 polymorphic TEs in 589 individuals from six isolates: Principal Component Analysis and Admixture showed that while closely related to other European populations, the isolates tend to cluster amongst themselves and are dominated by drift-induced ancestral components. When performing association tests with GEMMA, several TEs were deemed as significantly associated with a behavioral trait (tobacco use or alcohol consumption) or with BMI variations. Finally, some of the significant TEs also act as expression/alternative splicing quantitative trait loci.

Conclusions

These results suggest that polymorphic TEs may significantly impact inter-individual and inter-population phenotypic differentiation, while also effectively functioning as variability markers and potentially having a role in susceptibility to medical conditions. In light of these results, isolates could be used as a “laboratory” to investigate this impact and further our understanding about the role of TEs on the human genome.

genetic isolates

polymorphic transposable elements

GWAS

association study

behavioral traits

The Italian peninsula, due to its complex population structure, could play an important role in the understanding about the genetic diversity of current populations, being the natural crossroad for human migrations across the Mediterranean since prehistoric periods. These migration patterns left a tangible mark on present-day Italians, revealing a heterogeneous network of genomic landscapes across the peninsula, with North Italian groups being more closely related to Western/Eastern European populations and a progressively increasing genetic connection with Northern African and Middle Eastern populations as we move southwards [1]. On top of this clinal variation across the peninsula, the natural variety of environments [2] provoked a series of local adaptive events that determined, among other factors, a differential disease susceptibility of Italian subpopulations [1]. A refined understanding of these local events would improve our knowledge of human diversity as a whole, and on a more practical level allow us to provide more ad hoc medical care and measures to particularly susceptible subpopulations.

The underlying genetic variability of Italy remains under-sampled and underrepresented, with available human genome reference datasets such as the 1KGP, HGDP and SGDP only sampling three populations for the whole peninsula: Tuscans (TSI), individuals from Bergamo and Sardinians, a notion that only worsens when considering that Italy presents an increased number of isolated villages and subpopulations when compared to other European groups [3, 4], most of which remain uncharacterized.

These still isolated groups provide a desirable study subject to understand the Italian genetic variability: population isolates are characterized by small effective population sizes (Ne), which result in a decreased variability and stronger genetic drift effects, potentially increasing the frequency of variants that are rare or absent elsewhere and aiding at the discovery of novel rare variant signals underpinning complex traits such as medical risks and susceptibilities [5]. Population isolates tend not only to be genetically homogenous, but also to present an elevated diversity when compared to neighboring populations and their source population [3], because of geographical or cultural barriers that are necessary for the formation of the isolate in the first place. In addition, the isolates provided useful tools for genome-wide association studies [6].

However, most of the works done are based on SNP data and little research was done using other types of genetic markers such as Transposable Elements (TE). In this work, we focused on studying the association between non-reference polymorphic TEs, behavioral traits (tobacco use and alcohol consumption) and Body Mass Index (BMI) variations, which could lead to an increased risk of developing addiction-related or metabolic diseases, such as tobacco use disorder, alcoholism and obesity. To better understand the impact of these polymorphic TEs and how they can affect diversity in isolated communities, we also performed population genetics analyses such as PCA and Admixture using polymorphic TE markers and compared the results to other European and worldwide populations.

The dataset used in this study was generated in 2008 [3–5] from the sampling of 611 individuals from six geographically and historically isolated villages in the Friuli-Venezia Giulia region of North-Eastern Italy (Fig. 1). Since a few of the individuals present in the dataset were duplicates (specifically, 22 individuals from Resia), and three individuals were present only in the dataset but could not be traced back to any individual or village, these individuals were removed, leading the total number of analyzed individuals to 586. During the sampling, subjects were asked to fill out an anamnesis form to acquire more data on their general health and lifestyle habits, including information about alcohol consumption, smoking as well as their height and weight, from which we calculated the corresponding BMI (weight/height²).

Genomes were scanned in search for non-reference polymorphic TEs (Alus, LINE1s, and SVAs), using the Mobile Element Locator Tool (MELT) v2.2.2 [7]. The WGS data was aligned with bwa to the human reference HumanG1Kv38, and the aligned reads were used as input for MELT. For the calling process, we used the MELT mobile element reference sequences and the collection of insertion sites discovered in the Phase III of the 1000 Genomes Project (1KGP) as analysis priors. After the identification of these TEs, a self-customized Python script was applied to the resulting vcf files to calculate both allele and genotype frequencies of each TE for every isolated village. Allele frequencies were then analyzed for significant differences between villages with Fisher’s exact test, using a significant threshold of nominal p-value < 0.01 (“differentiated” TEs).

MELT provides gene names in RefSeq format: therefore, RefSeq accession numbers were converted to their respective Official Gene Symbol using the Database for Annotation, Visualization and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/) [8], taking into consideration the specific gene region TEs were inserted in (Intron, Exon, Promoter, Terminator, 5’ UTR and 3’ UTR).

To compare TE diversity of the isolates with other human populations, we built a new dataset consisting of polymorphic TEs identified with MELT that were present both in the six isolates and the populations of the 1000 Genomes Project [9]. This newly merged dataset contained a total of 2,814 variants for 3,090 individuals from 32 populations. The populations were divided into 6 groups based on geographic macro areas, consistent with the super populations of 1KGP [9], i.e. Africa (AFR), America (AMR), East Asia (EAS), Europe (EUR), South Asia (SAS), plus the isolates from Friuli-Venezia Giulia (FVG).

TEs were coded as single nucleotide variants, substituting the insertion with a nucleotide base that was non-complementary to that of the other allele in the genotype file. Information about the true nature of each insertion was kept in the original vcf file. Variants were then filtered with PLINK v1.9 [10] as follows: 1) Removal of insertions located on sexual chromosomes or mitochondrial genome insertions, to retain only autosomal variability and removal of duplicates, using the --exclude option. 2) Exclusion of individuals and variants with > 1% missing data with the commands --geno 0.01 (for variants) and --mind 0.01 (for individuals). 3) Removal of variants that did not respect the Hardy-Weinberg Equilibrium (HWE) with the option --hwe, setting a significant threshold of 0.01 using a Bonferroni Correction for multiple testing (threshold = 0.01/number of variants). 4) Removal of variants with a minor allele frequency < 0.01 (--maf 0.01). 5) Removal of closely related individuals with an Identity by Descent (IBD) estimate higher than 0.25, using the --genome option to calculate the pairwise IBD estimates between every couple of individuals and --remove to exclude one of the two related individuals. Therefore, the final filtered dataset was made of 1,703 variants shared among 3,087 individuals.

The generated dataset was then used to perform a series of analyses on TE insertions from the six isolates when compared to 1KGP groups. Both a Principal Component Analysis (PCA) and Admixture analysis were applied: PCA was performed after the conversion from the PLINK files (bed, bim, fam) with the convertf and smartpca tools of the EIGENSTRAT v6.0.1 package [11]. Admixture was implemented with the ADMIXTURE tool [12], testing between 2 and 23 potential ancestry components (K) and performing 50 iterations of each run to minimize the estimation error and maximize the log-likelihood of each ancestry estimate.

We then compared FVG isolates with other European populations, subsampling the original 1KGP dataset: Utah residents with North-Western European ancestry (CEU), Finnish in Finland (FIN), British in England and Scotland (GBR), Iberian populations in Spain (IBS), Tuscans in Italy (TSI). PCA and Admixture analyses were implemented using the above approach, the only difference being that we tested a number K of putative ancestry components between 2 and 12.

As introduced in the “background” section, individuals were asked to fill out an anamnesis form, including information on their health status and lifestyle habits. Three phenotypes were selected to perform association studies between polymorphic TEs and the considered traits: tobacco use, alcohol consumption and body mass index (the latter was calculated as weight/height²). The genome-wide association studies (GWAS) were performed with the software GEMMA [13, 14], applying for all the three phenotypes a multivariate linear mixed model (mvLMM) for association tests between a marker and multiple phenotypes, while also checking for stratification and estimating genetic correlation among phenotypes [14]. GEMMA was used on the full FVG dataset (12,709 TEs and 586 individuals). Three separate mvLMM association analyses were performed, using sex and age as covariates: 1) BMI; 2) a binary alcohol drinker/non-drinker variable (set as “1” for drinker individuals and “0” for non-drinkers); 3) a binary smoker/non-smoker variable (using “1” for smokers and “0” for non-smokers). Variables were tested using Wald’s test with a significant threshold of p-value = 0.001: manhattan plots with significant results (highlighted and annotated in green) have been obtained with the qqman R package (https://github.com/stephenturner/qqman). We then cross-checked the significant results with the lists of polymorphic TEs acting as expression/alternative splicing quantitative trait loci produced by Cao and colleagues [15], to investigate a possible function for the identified TEs. For each gene analysed we collected measures of genetic constraints such as pLI [16], RVIS [17] and SSC score [18] for prioritization. We considered as constrained genes with pLI > 0.9 or RVIS (%) < 25% or SSC score < -2.

TE variation distribution

After the analysis on polymorphic non-reference TEs with MELT v.2.2.2 [7], a total of 9,525 Alus, 2,283 LINE1s and 901 SVAs were retrieved.

Then, allele frequencies were scanned for significant differences between the isolates: this way, a total of 3,987 TEs (3,195 Alus, 636 LINE1s, and 156 SVAs) were identified as “differentiated” TEs. Of these significant insertions, we also considered their location (Table 1).

Table 1

Significantly different polymorphic TEs between the six villages. TEs are divided by insertion location relative to gene region (with percentages) and TE superclass.
Location	Alu	LINE-1	SVA
INTRONIC	1281 (40,1%)	242 (38%)	65 (41,7%)
PROMOTER	138 (4,3%)	23 (3,6%)	10 (6,4%)
TERMINATOR	106 (3,3%)	28 (4,4%)	11 (7%)
EXON	38 (1,2%)	6 (1%)	6 (3,8%)
3’-UTR	39 (1,2%)	6 (1%)	3 (1,9%)
5’-UTR	18 (0,6%)	2 (0,3%)	0
INTERGENIC	1575 (49,3%)	329 (51,7%)	61 (39,1%)
TOTAL	3195	636	156

While all insertion frequencies behave as expected, with most polymorphic TEs insertions being located in intronic and intergenic regions and only a negligible fraction located in exonic regions (Supplementary Table S1), it is interesting to note that SVAs, which can be up to 3kb long [19] are overall less frequent in intergenic sequences while they appear more often inside of “functional” regions (regulators or exons) when compared to Alus and LINE-1s. This finding corroborates the notion that SVA insertions have the innate potential to regulate gene expression through their location insertion and their sequence characteristics [20].

TEs as markers for population structure

PCA and Admixture showed that while closely related to other European populations, the isolates tend to cluster amongst themselves and are dominated by drift-induced ancestry components (Supplementary Figure S1); the first PC discriminates between African and non-African populations, while the second PC highlights a West-to-East geographical pattern including individuals from Friuli-Venezia Giulia, Europeans, Americans, South Asians and East Asians.

The PCA between European and FVG populations divides the two groups along the first PC, while the second component highlights the genetic variability between the isolates, dividing Resia and some individuals from Clauzetto and Sauris (Fig. 2A). As expected considering their history, Tuscans (TSI) and Central Europeans (CEU) are the closest groups to the FVG isolates; this PCA is similar to the one resulting in Esko et al. (2013) based on SNPs. Looking at the second and third PCs, it is interesting to note that PC2 divides Resia from Clauzetto, while the third component divides Sauris from Illegio. Moreover, Erto, San Martino, and most individuals from Clauzetto all cluster together with the other European populations (Fig. 2B). Finally, looking at the Admixture graph (Fig. 2C), the “tidiest” model is K = 7, as K = 9 presents excessive noise, especially in the African outgroup. However, at the best fitting K = 9 (CV error = 0.31088), Illegio, Resia, San Martino and Sauris are all dominated by their own ancestry components, which are present only marginally in Clauzetto, Erto and the other European populations.

Association studies

Several polymorphic TEs were identified by the association tests with GEMMA [13, 14] as possibly associated with the conditions detailed in Materials and Methods, and some of them also act as eQTLs/sQTLs:

1) Variations in Body Mass Index: four insertions were deemed significant. 3 Alus and 1 SVA (Fig. 3): two of these are located in genes: an Alu on chr10:15209391 in the gene NMT2 (N-myristoyltransferase 2) and the SVA on chr17:49150166 in the gene SPAG9 (Sperm Associated Antigen 9). Interestingly, all individuals analyzed present the Alu in an heterozygous condition, except 53 (9%) who do not carry the insertion (Table 2). The other two significant results are two intergenic Alus located on chr1:241980544 and chr14:65796449, respectively: notably, both act as eQTLs and sQTLs in several tissues, such as pituitary, blood, heart, testis and ovary.

2) Alcohol consumption: six Alus were found to be significant (Fig. 4), with only one in a genic region, the Alu on chr12:14020945 in the gene GRIN2B (Glutamate Ionotropic Receptor NMDA Type Subunit 2B). This TE was previously identified as “differentiated” among the isolates and is generally widespread in the villages (Table 2). The other five intergenic elements are all Alus located on chr6:1257163, chr6:161283170, chr12:58367298, chr13:112866653 (eQTL in testis, skin and colon sigmoid) and chr18:26214257 (sQTL in testis, adipose visceral omentum, thyroid and breast mammary tissue).

3) Tobacco use (smoking): seven TEs were deemed significant (Fig. 5), three of which are located in genes: the Alu on chr3:42856928, which acts as eQTL and sQTL in different tissues (including brain and lung) and is located in the gene ACKR2 (Atypical Chemokine Receptor 2); the Alu on chr11:102654750 in WTAPP1 (Wilms Tumor 1 Associated Protein Pseudogene 1), acting as eQTL in adrenal gland; and the Alu on chr12:129970510 in TMEM132D (Transmembrane Protein 132D). These last two Alus are mostly widespread in the six villages (Table 2). The other four intergenic insertions are located on chr2:174296971 (Alu), chr2:174296971 (Alu, acting as sQTL in brain hippocampus and cerebellum), chr14:65796449 (Alu, acting as eQTL/sQTL in several tissues), and a LINE-1 on chr22:26454699.

Among these genes SPAG9, TMEM132D and GRIN2B show evidence of genetic constraints using either pLI Score, RVIS or SSC Score.

Table 2

Absolute genotype frequencies of the four Alus located in the genes NMT2, GRIN2B, ACKR2 and TMEM132D. 0/0 = absent; 0/1 = heterozygous; 1/1 = homozygous. Interestingly, the insertion in NMT2 is present in 91% of the individuals but only in a heterozygous condition.
	Resia			Erto			Illegio			Sauris			San Martino			Clauzetto
	0\0	0\1	1\1	0\0	0\1	1\1	0\0	0\1	1\1	0\0	0\1	1\1	0\0	0\1	1\1	0\0	0\1	1\1
NMT2	12	176	0	9	89	0	6	92	0	6	119	0	14	151	0	6	113	0
GRIN2B	69	80	41	43	47	8	27	49	22	52	60	15	64	90	11	47	66	6
ACKR2	74	90	24	54	36	8	42	41	15	46	59	20	77	69	19	58	54	7
TMEM132D	14	76	100	18	48	32	5	38	55	3	56	68	22	84	59	15	55	49

Studying isolated communities, the basis of population genetics studies [21, 22], allowed researchers to study genomes that show high homogeneity and are subject to similar environmental and cultural pressures, such as lifestyle habits, diet, sanitary conditions, and disease vectors. These isolates are also an ideal subject to study the phenotypic effects of variants that were otherwise only marginally present in larger populations [22]. In this picture, Italian isolates are particularly important, mainly because of the peninsula’s central role in human migrations since prehistoric times and of the high number of genetically distinct isolated communities that have been established throughout history [23]. While previous studies on isolates have relied on single nucleotide variants, in this work we used polymorphic TEs: these markers have become available only in recent years [7, 24, 25], adding a new source of data for studying the diversity of the human genome. Interestingly, such markers have never been used to study the genetic underpinnings of human isolated communities and, therefore, this study is the first of a kind.

Using the Mobile Element Locator Tool [7] more than 12,000 polymorphic TEs were identified in the six villages of Friuli-Venezia Giulia. These TEs were used as genetic markers 1) to study communities’ differentiation; 2) to explore the genetic variability of the isolates; 3) and to analyze their possible role as genetic variants underlying susceptibility to different behavioral traits or medical conditions (tobacco use, alcohol consumption and BMI variations).

Firstly, with a self-customized python script, allele and genotype frequencies of the identified TEs were calculated: of 12,709 TEs, 3,987 have significantly different allele frequencies between the six isolates (Fisher’s exact test, p-value < 0.01).

Then, TEs were used as markers for exploratory analyses, such as PCA and Admixture, to look at the general diversity and ancestry of the isolates in the context of European genetic variability. By looking at the PCA between European populations from 1KGP [7] and FVG isolates (Fig. 2A), it is possible to note that the isolates tend to cluster amongst themselves: the PC1 divides Europeans and FVG subpopulations, while the PC2 show a differentiation between the isolates, which are distributed along the second principal component. On the other hand, when looking at the Fig. 2B, it is interesting to note that PC2 divides Resia from Clauzetto, while the third component divides Sauris from Illegio; furthermore, Erto, San Martino, and most individuals from Clauzetto cluster with the other European populations. Interestingly, Clauzetto is the least isolated village among the six FVG isolates [3], while Clauzetto, Erto and San Martino are genetically closest to the considered European populations and have the lowest inbreeding coefficients between the villages [4]. Similarly, in the Admixture graph, the FVG isolates are clearly distinct from the European populations and they tend to be dominated by their own ancestry components. PCA and Admixture results are in line with previous studies on the same isolates performed with SNPs [3–5]: moreover, the observed patterns of genetic variability and ancestry components could be explained by genetic drift, a suggestion made also by previous works on the same dataset [3–5]. The observation of a strong correlation between SNP-driven results and TE-driven results in terms of population structure further highlights that the genomic pattern of polymorphic TE is mainly the result of demographic events.

In the last decades, we have come to know much more about the impact of these elements on the genome and gene networks, and it has been shown that TE insertions can generate diversity in a variety of ways. For example, transposable elements have been linked to providing polyadenylation signals inducing the termination of transcripts [26], modifying splicing patterns and providing new splicing sites [27], epigenetically affecting nearby genes [28, 29], acting as novel promoters, enhancers, and transcription factor building sites [30, 31], and often carrying their enhancers and promoters [32]. With their innate ability to act as disruptors and deregulators of gene expression, TE insertions have been associated with a variety of human diseases: for example, several cancer types [33, 34], hemophilia A and B [35, 36], some inheritable genetic diseases such as Dent’s disease or Duchenne muscular dystrophy [37], metabolic diseases [38], substance abuse, and central nervous systems diseases [39].

In particular, much interest has been given in recent years to the impact of transposable elements on the central nervous system [39–41], jumpstarted by the NGS revolution which allowed for the efficient typing of thousands of transposable elements at once. Genome-wide approaches allowed researchers to study the role of transposable elements in stress-related learning mechanisms in rats [42], which have been used as a model for post-traumatic stress disorder (PTSD) in humans [43]. Likewise, transposable elements have also been associated with alcoholism in humans using the same genome-wide approach [39].

Three association tests with GEMMA [13, 14] were implemented, using sex and age as covariates, and testing for an association between polymorphic TEs and phenotypes for which we had information from the study’s participants: tobacco use, alcohol consumption, height and weight, from which we calculated body mass index (weight/height²): manhattan plots of the three tests are shown in Figs. 3, 4 and 5. Several TEs were deemed significant, some of which are located in known genes: an Alu (chr10:15209391) in the gene NMT2 and a SVA (chr17:49150166) in the gene SPAG9 (BMI variations); the Alu on chr3:42856928 in the gene ACKR2, the Alu on chr11:102654750 in WTAPP1, the Alu on chr12:129970510 in TMEM132D (tobacco use/smoking) and the Alu on chr12:14020945 in the gene GRIN2B (alcohol consumption). Three of these genes also show evidence of genetic constraints and thus should be prioritized in further investigations, as genes showing evidence of purifying selection in healthy individuals may be judged more likely to cause certain kinds of disease. For instance, the gene NMT2 encodes one of two N-myristoyl-transferase proteins, allowing the regulation of signaling proteins function and localization [44], and several variants of this gene have been associated with body height and hip-bone density [44–46], further strengthening the link between this gene and BMI variations. GRIN2B encodes a member of the ionotropic glutamate receptor superfamily and plays a major role in brain development and synaptic plasticity, with mutations in this gene often associated with neurodevelopmental disorders [47]. Moreover, variants of this gene have been associated with alcohol and tobacco consumption [48], general risk-taking behaviors [49], opioid dependence [50], and several neurological disorders such as schizophrenia [51] and Alzheimer’s disease [52]. Regarding tobacco use, the insertion in ACKR2 (also known as D6) emerged as one of the most promising results: the Alu acts as eQTL/sQTL in brain tissues and lung. The gene [53] controls chemokine levels and localization and is known to be involved in inflammatory responses [54]. Moreover, a work by Bazzan and colleagues [55] on chronic obstructive pulmonary disease (COPD) “demonstrates an increased expression of the atypical chemokine receptor D6 in peripheral lung from smokers with COPD but not in smoking subjects who did not develop the disease and nonsmoker control subjects”. Finally, TMEM132D, encoding for a transmembrane protein, has already been associated with many neurological disorders such as anxiety and panic disorders [56] and general behavioral disinhibition, including alcohol consumption and dependence, illicit drug use, and nicotine use [57].

Polymorphic transposable elements emerge as a compelling avenue for elucidating human genetic diversity. The innovative utilization of polymorphic TEs as markers for genetic variability within isolated communities represents an unprecedented methodological advancement. This study demonstrates the utility of polymorphic TEs in effectively encapsulating genetic variability and historical contexts among isolates, substantiated by congruent outcomes with prior investigations relying on single nucleotide variants [3–5]. While progress has been made, the comprehensive impact of transposable elements on the human genome remains incompletely understood, as does the cascade of effects on diverse phenotypes. This investigation identifies numerous TE insertions correlated with specific phenotypes, such as substance use and metabolic disorders. It is imperative to underscore the exploratory nature of our analyses, necessitating further empirical validation to establish definitive causal links between these insertions and medical susceptibility. Nevertheless, the identified insertions stand as pivotal points of interest, providing a foundational platform for subsequent research. In the context of isolated communities, these populations serve as invaluable "laboratories," affording unique insights into the influence of transposable elements on physical, psychological, and behavioral traits. Consequently, prospective studies should prioritize the validation of identified variants and engage in selection analyses to discern potential instances of natural selection within these isolated populations. This forward-looking research agenda holds significant promise for advancing our understanding of the intricate interplay between transposable elements and human phenotypic traits.

Data availability

Genetic Data of isolated populations are available in the European Genome-phenome Archive (EGA) at the following links. BAM files https://www.ebi.ac.uk/ega/studies/EGAS00001000252 (accessed on 21st February 2024); sample list, vcf files https://www.ebi.ac.uk/ega/studies/EGAS00001001597 (accessed on 21st February 2024; https://www.ebi.ac.uk/ega/datasets/EGAD00001002729 (accessed on 21st February 2024).

Author contributions

A.B. and M.M. conceived the study. G.M. (Giorgia Modenini), G.M. (Giacomo Mercuri) and P.A. performed all the in silico analyses and plotted results. G.G.N., A.S., P.T., B.S., A.P., G.P., M.P.C., G.G. and P.G. collected data and published them on EGA. G.M. (Giorgia Modenini), G.M. (Giacomo Mercuri), P.A., A.B. and M.M. wrote the manuscript. All Authors read and approved the manuscript.

Declaration of interests

The Authors declare that they have no competing interests.

Funding Declaration

The Authors declare that no funding was associated with this work.

Human Ethics and Consent to Participate

Not applicable.

Consent for Publication declarations

Not applicable.

Sazzini M, Gnecchi Ruscone GA, Giuliani C, Sarno S, Quagliariello A, De Fanti S, et al. Complex interplay between neutral and adaptive evolution shaped differential genomic background and disease susceptibility along the Italian peninsula. Sci Rep. 2016;6:32513.
Pesaresi S, Galdenzi D, Biondi E, Casavecchia S. Bioclimate of Italy: application of the worldwide bioclimatic classification system. Journal of Maps [Internet]. 2014 [cited 2024 Jan 29];10:538–53. Available from: http://www.tandfonline.com/doi/abs/10.1080/17445647.2014.891472
Esko T, Mezzavilla M, Nelis M, Borel C, Debniak T, Jakkula E, et al. Genetic characterization of northeastern Italian population isolates in the context of broader European genetic diversity. Eur J Hum Genet. 2013;21:659–65.
Cocca M, Barbieri C, Concas MP, Robino A, Brumat M, Gandin I, et al. A bird’s-eye view of Italian genomic variation through whole-genome sequencing. Eur J Hum Genet. 2020;28:435–44.
Xue Y, Mezzavilla M, Haber M, McCarthy S, Chen Y, Narasimhan V, et al. Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations. Nat Commun. 2017;8:15927.
Southam L, Gilly A, Süveges D, Farmaki A-E, Schwartzentruber J, Tachmazidou I, et al. Whole genome sequencing and imputation in isolated populations identify genetic associations with medically-relevant complex traits. Nat Commun. 2017;8:15606.
Gardner EJ, Lam VK, Harris DN, Chuang NT, Scott EC, Pittard WS, et al. The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res. 2017;27:1916–29.
Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022;50:W216–21.
Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526:68–74.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics [Internet]. 2007 [cited 2023 Oct 24];81:559–75. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0002929707613524
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.
Alexander DH, Lange K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics. 2011;12:246.
Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 2012;44:821–4.
Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods. 2014;11:407–9.
Cao X, Zhang Y, Payer LM, Lords H, Steranka JP, Burns KH, et al. Polymorphic mobile element insertions contribute to gene expression and alternative splicing in human tissues. Genome Biol [Internet]. 2020 [cited 2022 Nov 18];21:185. Available from: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02101-4
Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–91.
Petrovski S, Wang Q, Heinzen EL, Allen AS, Goldstein DB. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 2013;9:e1003709.
Mezzavilla M, Cocca M, Guidolin F, Gasparini P. A population-based approach for gene prioritization in understanding complex traits. Hum Genet. 2020;139:647–55.
Wang H, Xing J, Grover D, Hedges DJ, Han K, Walker JA, et al. SVA elements: a hominid-specific retroposon family. J Mol Biol. 2005;354:994–1007.
Gianfrancesco O, Bubb VJ, Quinn JP. SVA retrotransposons as potential modulators of neuropeptide gene expression. Neuropeptides. 2017;64:3–7.
Charlesworth B. Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat Rev Genet. 2009;10:195–205.
Hatzikotoulas K, Gilly A, Zeggini E. Using population isolates in genetic association studies. Brief Funct Genomics. 2014;13:371–7.
Destro Bisol G, Anagnostou P, Batini C, Battaggia C, Bertoncini S, Boattini A, et al. Italian isolates today: geographic and linguistic factors shaping human biodiversity. J Anthropol Sci. 2008;86:179–88.
Rishishwar L, Tellez Villa CE, Jordan IK. Transposable element polymorphisms recapitulate human evolution. Mob DNA. 2015;6:21.
Watkins WS, Feusier JE, Thomas J, Goubert C, Mallick S, Jorde LB. The Simons Genome Diversity Project: A Global Analysis of Mobile Element Diversity. Schaack S, editor. Genome Biology and Evolution [Internet]. 2020 [cited 2023 Oct 24];12:779–94. Available from: https://academic.oup.com/gbe/article/12/6/779/5828221
Lee JY, Ji Z, Tian B. Phylogenetic analysis of mRNA polyadenylation sites reveals a role of transposable elements in evolution of the 3’-end of genes. Nucleic Acids Res. 2008;36:5581–90.
Belancio VP, Roy-Engel AM, Deininger P. The impact of multiple splice sites in human L1 elements. Gene. 2008;411:38–45.
Hata K, Sakaki Y. Identification of critical CpG sites for repression of L1 transcription by DNA methylation. Gene. 1997;189:227–34.
Enriquez-Gasca R, Gould PA, Rowe HM. Host Gene Regulation by Transposable Elements: The New, the Old and the Ugly. Viruses. 2020;12:1089.
Kim DS, Hahn Y. Identification of human-specific transcript variants induced by DNA insertions in the human genome. Bioinformatics. 2011;27:14–21.
Pontis J, Planet E, Offner S, Turelli P, Duc J, Coudray A, et al. Hominoid-Specific Transposable Elements and KZFPs Facilitate Human Embryonic Genome Activation and Control Transcription in Naive Human ESCs. Cell Stem Cell. 2019;24:724-735.e5.
Cordaux R, Batzer MA. The impact of retrotransposons on human genome evolution. Nat Rev Genet. 2009;10:691–703.
Anwar SL, Wulaningsih W, Lehmann U. Transposable Elements in Human Cancer: Causes and Consequences of Deregulation. Int J Mol Sci. 2017;18:974.
Chénais B. Transposable Elements and Human Diseases: Mechanisms and Implication in the Response to Environmental Pollutants. Int J Mol Sci. 2022;23:2551.
Kazazian HH, Wong C, Youssoufian H, Scott AF, Phillips DG, Antonarakis SE. Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature. 1988;332:164–6.
Nakamura Y, Murata M, Takagi Y, Kozuka T, Nakata Y, Hasebe R, et al. SVA retrotransposition in exon 6 of the coagulation factor IX gene causing severe hemophilia B. Int J Hematol. 2015;102:134–9.
Payer LM, Burns KH. Transposable elements in human genetic disease. Nat Rev Genet. 2019;20:760–72.
Jelassi A, Slimani A, Rabès JP, Jguirim I, Abifadel M, Boileau C, et al. Genomic characterization of two deletions in the LDLR gene in Tunisian patients with familial hypercholesterolemia. Clin Chim Acta. 2012;414:146–51.
Reilly MT, Faulkner GJ, Dubnau J, Ponomarev I, Gage FH. The role of transposable elements in health and diseases of the central nervous system. J Neurosci. 2013;33:17577–86.
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–53.
Erwin JA, Marchetto MC, Gage FH. Mobile DNA elements in the generation of diversity and complexity in the brain. Nat Rev Neurosci. 2014;15:497–506.
Rau V, Fanselow MS. Exposure to a stressor produces a long lasting enhancement of fear learning in rats. Stress. 2009;12:125–33.
Ponomarev I, Rau V, Eger EI, Harris RA, Fanselow MS. Amygdala transcriptome and cellular mechanisms underlying stress-enhanced fear learning in a rat model of posttraumatic stress disorder. Neuropsychopharmacology. 2010;35:1402–11.
Yengo L, Vedantam S, Marouli E, Sidorenko J, Bartell E, Sakaue S, et al. A saturated map of common genetic variants associated with human height. Nature. 2022;610:704–12.
Christakoudi S, Evangelou E, Riboli E, Tsilidis KK. GWAS of allometric body-shape indices in UK Biobank identifies loci suggesting associations with morphogenesis, organogenesis, adrenal cell renewal and cancer. Sci Rep. 2021;11:10688.
Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, Kurki M, Koshiba S, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 2021;53:1415–24.
Platzer K, Lemke JR. GRIN2B-Related Neurodevelopmental Disorder. In: Adam MP, Feldman J, Mirzaa GM, Pagon RA, Wallace SE, Bean LJ, et al., editors. GeneReviews® [Internet]. Seattle (WA): University of Washington, Seattle; 1993 [cited 2024 Jan 29]. Available from: http://www.ncbi.nlm.nih.gov/books/NBK501979/
Saunders GRB, Wang X, Chen F, Jang S-K, Liu M, Wang C, et al. Genetic diversity fuels gene discovery for tobacco and alcohol use. Nature [Internet]. 2022 [cited 2024 Jan 29];612:720–4. Available from: https://www.nature.com/articles/s41586-022-05477-4
Karlsson Linnér R, Biroli P, Kong E, Meddens SFW, Wedow R, Fontana MA, et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat Genet. 2019;51:245–57.
Sherva R, Zhu C, Wetherill L, Edenberg HJ, Johnson E, Degenhardt L, et al. Genome-wide association study of phenotypes measuring progression from first cocaine or opioid use to dependence reveals novel risk genes. Explor Med. 2021;2:60–73.
Goes FS, McGrath J, Avramopoulos D, Wolyniec P, Pirooznia M, Ruczinski I, et al. Genome-wide association study of schizophrenia in Ashkenazi Jews. Am J Med Genet B Neuropsychiatr Genet. 2015;168:649–59.
Kulminski AM, Loiko E, Loika Y, Culminskaya I. Pleiotropic predisposition to Alzheimer’s disease and educational attainment: insights from the summary statistics analysis. Geroscience. 2022;44:265–80.
Nibbs RJ, Wylie SM, Yang J, Landau NR, Graham GJ. Cloning and characterization of a novel promiscuous human beta-chemokine receptor D6. J Biol Chem. 1997;272:32078–83.
Cancellieri C, Caronni N, Vacchini A, Savino B, Borroni EM, Locati M, et al. Review: Structure-function and biological properties of the atypical chemokine receptor D6. Mol Immunol. 2013;55:87–93.
Bazzan E, Saetta M, Turato G, Borroni EM, Cancellieri C, Baraldo S, et al. Expression of the atypical chemokine receptor D6 in human alveolar macrophages in COPD. Chest. 2013;143:98–106.
Otowa T, Maher BS, Aggen SH, McClay JL, van den Oord EJ, Hettema JM. Genome-wide and gene-based association studies of anxiety disorders in European and African American samples. PLoS One. 2014;9:e112559.
McGue M, Zhang Y, Miller MB, Basu S, Vrieze S, Hicks B, et al. A genome-wide association study of behavioral disinhibition. Behav Genet. 2013;43:363–73.

No competing interests reported.

SupplementaryMaterials.pdf

Download PDF

Editorial decision: Revision requested
25 Mar, 2024
Reviews received at journal
22 Mar, 2024
Reviewers agreed at journal
12 Mar, 2024
Reviewers invited by journal
11 Mar, 2024
Editor assigned by journal
07 Mar, 2024
Submission checks completed at journal
06 Mar, 2024
First submitted to journal
24 Feb, 2024

You are reading this latest preprint version

Role of Transposable Elements in behavioral traits: insights from six genetic isolates from North-Eastern Italy

Status:

Version 1

Abstract

Background

Results

Conclusions

Figures

Background

Materials and Methods

Results

TEs as markers for population structure

Association studies

Discussion

Conclusions

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1