Studying isolated communities, the basis of population genetics studies [21, 22], allowed researchers to study genomes that show high homogeneity and are subject to similar environmental and cultural pressures, such as lifestyle habits, diet, sanitary conditions, and disease vectors. These isolates are also an ideal subject to study the phenotypic effects of variants that were otherwise only marginally present in larger populations [22]. In this picture, Italian isolates are particularly important, mainly because of the peninsula’s central role in human migrations since prehistoric times and of the high number of genetically distinct isolated communities that have been established throughout history [23]. While previous studies on isolates have relied on single nucleotide variants, in this work we used polymorphic TEs: these markers have become available only in recent years [7, 24, 25], adding a new source of data for studying the diversity of the human genome. Interestingly, such markers have never been used to study the genetic underpinnings of human isolated communities and, therefore, this study is the first of a kind.
Using the Mobile Element Locator Tool [7] more than 12,000 polymorphic TEs were identified in the six villages of Friuli-Venezia Giulia. These TEs were used as genetic markers 1) to study communities’ differentiation; 2) to explore the genetic variability of the isolates; 3) and to analyze their possible role as genetic variants underlying susceptibility to different behavioral traits or medical conditions (tobacco use, alcohol consumption and BMI variations).
Firstly, with a self-customized python script, allele and genotype frequencies of the identified TEs were calculated: of 12,709 TEs, 3,987 have significantly different allele frequencies between the six isolates (Fisher’s exact test, p-value < 0.01).
Then, TEs were used as markers for exploratory analyses, such as PCA and Admixture, to look at the general diversity and ancestry of the isolates in the context of European genetic variability. By looking at the PCA between European populations from 1KGP [7] and FVG isolates (Fig. 2A), it is possible to note that the isolates tend to cluster amongst themselves: the PC1 divides Europeans and FVG subpopulations, while the PC2 show a differentiation between the isolates, which are distributed along the second principal component. On the other hand, when looking at the Fig. 2B, it is interesting to note that PC2 divides Resia from Clauzetto, while the third component divides Sauris from Illegio; furthermore, Erto, San Martino, and most individuals from Clauzetto cluster with the other European populations. Interestingly, Clauzetto is the least isolated village among the six FVG isolates [3], while Clauzetto, Erto and San Martino are genetically closest to the considered European populations and have the lowest inbreeding coefficients between the villages [4]. Similarly, in the Admixture graph, the FVG isolates are clearly distinct from the European populations and they tend to be dominated by their own ancestry components. PCA and Admixture results are in line with previous studies on the same isolates performed with SNPs [3–5]: moreover, the observed patterns of genetic variability and ancestry components could be explained by genetic drift, a suggestion made also by previous works on the same dataset [3–5]. The observation of a strong correlation between SNP-driven results and TE-driven results in terms of population structure further highlights that the genomic pattern of polymorphic TE is mainly the result of demographic events.
In the last decades, we have come to know much more about the impact of these elements on the genome and gene networks, and it has been shown that TE insertions can generate diversity in a variety of ways. For example, transposable elements have been linked to providing polyadenylation signals inducing the termination of transcripts [26], modifying splicing patterns and providing new splicing sites [27], epigenetically affecting nearby genes [28, 29], acting as novel promoters, enhancers, and transcription factor building sites [30, 31], and often carrying their enhancers and promoters [32]. With their innate ability to act as disruptors and deregulators of gene expression, TE insertions have been associated with a variety of human diseases: for example, several cancer types [33, 34], hemophilia A and B [35, 36], some inheritable genetic diseases such as Dent’s disease or Duchenne muscular dystrophy [37], metabolic diseases [38], substance abuse, and central nervous systems diseases [39].
In particular, much interest has been given in recent years to the impact of transposable elements on the central nervous system [39–41], jumpstarted by the NGS revolution which allowed for the efficient typing of thousands of transposable elements at once. Genome-wide approaches allowed researchers to study the role of transposable elements in stress-related learning mechanisms in rats [42], which have been used as a model for post-traumatic stress disorder (PTSD) in humans [43]. Likewise, transposable elements have also been associated with alcoholism in humans using the same genome-wide approach [39].
Three association tests with GEMMA [13, 14] were implemented, using sex and age as covariates, and testing for an association between polymorphic TEs and phenotypes for which we had information from the study’s participants: tobacco use, alcohol consumption, height and weight, from which we calculated body mass index (weight/height2): manhattan plots of the three tests are shown in Figs. 3, 4 and 5. Several TEs were deemed significant, some of which are located in known genes: an Alu (chr10:15209391) in the gene NMT2 and a SVA (chr17:49150166) in the gene SPAG9 (BMI variations); the Alu on chr3:42856928 in the gene ACKR2, the Alu on chr11:102654750 in WTAPP1, the Alu on chr12:129970510 in TMEM132D (tobacco use/smoking) and the Alu on chr12:14020945 in the gene GRIN2B (alcohol consumption). Three of these genes also show evidence of genetic constraints and thus should be prioritized in further investigations, as genes showing evidence of purifying selection in healthy individuals may be judged more likely to cause certain kinds of disease. For instance, the gene NMT2 encodes one of two N-myristoyl-transferase proteins, allowing the regulation of signaling proteins function and localization [44], and several variants of this gene have been associated with body height and hip-bone density [44–46], further strengthening the link between this gene and BMI variations. GRIN2B encodes a member of the ionotropic glutamate receptor superfamily and plays a major role in brain development and synaptic plasticity, with mutations in this gene often associated with neurodevelopmental disorders [47]. Moreover, variants of this gene have been associated with alcohol and tobacco consumption [48], general risk-taking behaviors [49], opioid dependence [50], and several neurological disorders such as schizophrenia [51] and Alzheimer’s disease [52]. Regarding tobacco use, the insertion in ACKR2 (also known as D6) emerged as one of the most promising results: the Alu acts as eQTL/sQTL in brain tissues and lung. The gene [53] controls chemokine levels and localization and is known to be involved in inflammatory responses [54]. Moreover, a work by Bazzan and colleagues [55] on chronic obstructive pulmonary disease (COPD) “demonstrates an increased expression of the atypical chemokine receptor D6 in peripheral lung from smokers with COPD but not in smoking subjects who did not develop the disease and nonsmoker control subjects”. Finally, TMEM132D, encoding for a transmembrane protein, has already been associated with many neurological disorders such as anxiety and panic disorders [56] and general behavioral disinhibition, including alcohol consumption and dependence, illicit drug use, and nicotine use [57].