Bioinformatic analysis of the TARDBP promoter
The characterisation of TARDBP gene sequence (Ensembl gene ID: ENSG00000120948) was initially performed by a bioinformatic analysis using the UCSC Genome Browser (GRCh37/hg19). The TARDBP gene, mapped to chromosome 1p36.22 region, spans 12870bp from position 11072679 to 11085548 on the forward DNA strand. Inspection of UCSC TARDBP transcripts matched with those retrieved from the Database of Transcriptional Start Site (DBTSS, http://dbtss.hgc.jp 37), it was evident that most of the transcription start sites coincide with the reference sequence’s one (position 11072679) (Figure 1). Apparently, there are no tissue-specific transcription start sites (TSS), since only one mRNA species seems to be predominant in Adult and Fetal Tissues, as well as in different tissues. Figure 1 shows the TARDBP promoter sequence (starting from the 1316 nucleotide upstream from the TSS of NM_007375.3 TDP-43 transcript).
Subsequently, by querying the Eukaryotic Promoter Database 38, different putative TATA-box motifs ([p-value = 0.01]: -930, -865, -606, -571, -513, -309), CG-box sequences (CCAAT-box [p-value = 0.01]: -744, -677, -572, -61) were identified, but deeper inspection revealed that the positions are fully consistent with the main mapped TSS (DBTSS) . In addition, also the analysis performed with another promoter search tool (GPMiner 39) failed to detect the main core promoter elements.
These observations suggest that the expression of the human TARDBP gene is driven by a TATA-less promoter.
Conservation of TARDBP promoter sequence throughout evolution
In order to study the evolutionary conservation of TARDBP promoter, the 1316 nucleotides upstream of the TSS of TARDBP transcripts from human and other species were retrieved using Genbank and Ensembl Genome Browser (http://www.ensembl.org/). Alignments were performed using MUSCLE software 40.
Among primates, we compared the genomic sequences of the putative TARDBP promoter region from Hominoidea (Homo sapiens - Human, Pan paniscus - Bonobo, and Nomascus leucogenys - Gibbon), from of New World monkey (Callithrix jacchus - Common marmoset) and from Old World monkey (Macaca mulatta - Rhesus macaque; Papio anubis - Olive baboon).
In general, this comparison highlighted that the relative degree of similarity among all three species increases significantly with respect the proximity to TSS. Then, in the region spanning from nucleotide -1316 to -1000 upstream from TSS, the hominoid sequences seem to be closer to each other than with that of New and Old World monkeys. On the other hand, in the region spanning the 1000 nucleotides upstream from TSS, the sequences from all primates show a high degree of identity (Figure 2). On the other hand, the alignment of the human, mouse and rat putative TARDBP promoter sequences shows a limited degree of identity (Figure 3). Nevertheless, the region of the rodent gene promoters spanning approximately 500nt upstream from the TSS shows a higher level of similarity, suggesting that proximal region of the putative TARDBP promoter region might encompass regulatory elements conserved across primates and rodents.
Characterization of cis-regulatory sequences of TARDBP promoter
In order to identify the minimal functional sequence responsible for the transcriptional activity of the promoter of the human TARDBP gene, a luciferase assay was set up with deletion-fragments of the putative TARDBP promoter constructs cloned into the pGL4 luciferase reporter vector. Different segments of the putative regulatory region of the promoter were amplified through a PCR reaction, and eight constructs differing in length were produced: the sequences encompassed between 27 nucleotides downstream from the TSS (+1nt, Refseq NM_007375.3) to 1316, 927, 451, 380, 320, 280, 230 and 180 nucleotides upstream from the TSS (Figure 4).
The promoter fragments were subcloned into the pGL4.11 vector, transfected in cell lines of different tissutal origin (HEK293, HeLa, Neuro2A and SH-SY5Y) and the luciferase activity of each construct was normalised versus the 1316 construct (i.e., 1316 =1).
Intracellular analysis showed that most promoter activity was retained in the fragments spanning from 1316 to 451 nucleotides upstream from the TSS within all tested cell lines (Figure 4). Further deletions resulted in dramatic reduction of activity (380 and 330 constructs), with almost complete loss of activity when the promoter was shortened to 230 and 180 nucleotides upstream from the TSS, in all tested cell lines. These results show that the minimal promoter region encompasses the 451 nucleotides upstream from the TSS.
We also tried to detect any kind of tissue-specific promoter activity. To this aim, we compared the activity of the 1316, 927 and 451 promoter fragments among SH-SY5Y, Neuro2A, HeLa, and Hek293 cell lines, considering the Hek293 cells as the unitary reference. The constructs showed approximately 6-8x higher activity in the neuronal cell lines (SH-SY5Y, Neuro2A) and HeLa than in HEK293 (Figure 5).
Transcriptional effects of SNPs found within the TARDBP promoter of ALS patients
The analysis of pathogenic mutations found in the TARDBP gene sequence of 46 australian patients of European descent affected by sporadic Amyotrophic Lateral Sclerosis (sALS) has revealed two promoter variants (c.1-562t>c and c.1-100t>c) with a different frequency in patients than in controls (115 neurologically normal people or HapMap European and Sub-Saharan African cohorts) 36. The c.1-562t>c single nucleotide substitution (rs9430335; NG_008734.1:g.4439C>T; NM_007375.3:c.-696C>T) was found in homozygosity (C/C) at higher frequency and in heterozygosity (T/C) at lower frequency in sALS patients, as compared to controls (0.2 vs 0.06 and 0.1 vs 0.3, respectively). On the other hand, the c.1-100t>c polymorphism (rs968545; NG_008734.1:g.4901T>C; NM_007375.3:c.-234T>C) was present only in heterozygosity (T/C) at higher frequency (0.2 vs 0.1) than in controls 36.
Therefore, we sought to test the impact of these SNPs on the transcription of TDP-43, in order to find a possible functional correlation with the manifestation of ALS. In order to analyze the SNP effects alone or in combination, we created three variants of our 927 original pGL4 construct (4439:T; 4901:T), so reproducing all the possible alleles (4439:T; 4901:C), (4439:C; 4901:C) and (4439:C; 4901:T) (Figure 6A).
The constructs were transfected in SH-SY5Y cell line. The luciferase activity of the three mutant constructs was compared to that of control (4439:T; 4901:T), used as unitary reference (Figure 6B): no statistically significant differences were observed in the transcriptional activity of the three variants (Figure 6B).
TDP-43 does not influence its own transcription
After characterizing the sequence of the promoter of TARDBP gene and its activity in different cell lines, we wished to evaluate the ability of TDP -43 to influence, directly or indirectly, the synthesis of its own transcript. For transfections of these constructs have been used our inducible HEK293-TDP wt cell line 29. This cell lines was used as it allows a more homogenous TDP-43 overexpression, in comparison to transient overexpression of this factor.
These cells were transfected with the constructs 451, 927, and 1316 (along with Renilla plasmid). TDP-43 expression was driven by Tetracycline induction (48 hrs) and its levels were probed by Western blotting (Figure 7A). The analysis of the luciferase activity 48 hrs later did not show significant differences upon TDP-43 tetracycline induced overexpression (Figure 7B). Similar results were obtained with transient transfection of SH-SY5Y of the the constructs 451, 927, and 1316 (with the Renilla reporter) after TDP-43 transient overexpression (data not shown). These results suggest that TDP-43 does not influence its own transcription.
The TARDBP 5'UTR and intron 1 splicing positively impact the luciferase expression.
The 5'UTR region of TARDBP gene encompassed exon 1 (102bp) and the first 12bp of exon 2, separated by intron 1 (972bp). In order to explore the presence of additional elements able to modulate TDP-43, the functional impact of the 5' UTR (construct 451+Ex1Ex2) and intron 1 (construct 451+Ex1-IVS1-Ex2) of TARDBP was analysed by generating some variants of the 451 plasmid (Figure 7C). A first variant was containing the TARDBP 5'UTR (exon 1, 102bp and exon 2, 12bp) correctly spliced (451+Ex1Ex2). The second construct was created by inserting the region encompassing exon1 (102bp) , intron 1 (972bp) and the first 12bp of exon 2 of TARDBP gene in between the 451 promoter and the luciferase ATG codon (451+Ex1-IVS1-Ex2 wt). The third construct was a mutant of latter construct where the 3' splice site of intron 1 was distupted (451+Ex1-IVS1-Ex2 mut).
In SH-SY5Y cells, the presence of a correcly sized amplicon was observed only after transfection of the 451, 451+Ex1Ex2 and 451 Ex1-IVS1-Ex2 wt constructs (Figure 7D). Apparently, the relative luciferase activities positively correlate with the occurrence of splicing. In fact, the same constructs were transfected in SH-SY5Y cells and their luciferase activity was measured (Figure 7E): the activity of the 451+Ex1Ex2 wt (containing the pre-arranged and correctly spliced 5'UTR region) was 1.5x that of the control. The 451+Ex1-IVS1-Ex2 wt construct showed a 5x increment in luciferase activity (as compared to the control, -451 construct). On the other hand, the 451+Ex1-IVS1-Ex2 mut construct showed negligible activity when compared with the other constructs (Figure 7E).
Altogether these results suggest that the presence of the 5'UTR as well as the correct splicing event of intron 1 are elements able to modulate the luciferase expression (and, potentially, of TDP-43) at transcriptional level.