Open Genes data
Genes selection
The Open Genes database aims to collect all available data on the genetics of aging and structure it for analysis and searching for aging therapy targets. The overview of Open Genes is presented on Fig. 1.
We collected several types of data regarding each gene. We distinguished 6 types of studies and 12 criteria for adding genes to the Open Genes database:
-
Changes in gene activity affect model organism lifespan:
-
Changes in gene activity extend the mammalian lifespan
-
Changes in gene activity extend the non-mammalian lifespan
-
Changes in gene activity reduce the mammalian lifespan
-
Changes in gene activity reduce the non-mammalian lifespan
-
Age-related changes in gene expression, methylation or protein activity:
-
Age-related changes in humans
-
Age-related changes in mammals
-
Age-related changes in non-mammals
-
Changes in gene activity affect the age-related process:
-
Changes in gene activity protect against the age-related impairment
-
Changes in gene activity enhance the age-related deterioration
-
Association of gene variants or expression levels with longevity
-
Association of the gene with accelerated aging in humans
-
Regulation of genes associated with aging
The gene was added to the Open Genes database if it met at least one of the 12 criteria. Figure 2 shows the Open Genes database statistics. At this time, Open Genes contains 2402 genes, associated with aging.
Filters on the main genes table (https://open-genes.com/genes) of the database allow users to generate lists of genes according to selection criteria, combination of criteria and many other parameters.
Ranking the criteria
Of course, to meet one of these 12 criteria is not sufficient enough to prove gene-longevity association in humans. Therefore, we assigned the criteria a high, medium, or low priority (Fig. 3).
It is unclear whether changing the activity of genes that decreased lifespan in the opposite direction will lead to an increase in life expectancy. We believe that genetic manipulations, increasing lifespan, create stronger evidence for considering genes as an aging therapy target than manipulations, decreasing lifespan.Therefore, we assigned lifespan experiments a different priority depending on the outcome. We assigned a high priority to evidence related to the increase in the lifespan of model organisms.
Medium priority is given to data on protection against the age-related impairment, association of a gene with accelerated aging in humans, decrease in the lifespan of model organisms, age-related changes in gene expression in humans and mammals, and associations of gene variants with longevity. All other criteria, such as deterioration in age-related processes, regulation of other age-related genes and age-related changes in non-mammals, have low priority.
Division of aging-associated genes to different confidence levels
We divided all genes into five confidence levels: highest, high, moderate, low and lowest (Fig. 3). The group of genes with the highest confidence level includes 25 human genes that increased the lifespan of mammals and are also known to have variants associated with human longevity.
We proceeded from the assumption that it remains unclear if we can extrapolate to humans the results obtained on model organisms. At the same time, population studies of the genetics of longevity often show conflicting results and only let us suggest the influence of a particular gene on the predisposition to longevity. Therefore, we used data on the association with longevity as an additional parameter for genes that extended the mammalian lifespan. Table 1 shows Open Genes data on these 25 genes.
Table 1
Genes associated with human longevity and increased mammalian lifespan
Gene symbol
|
Gene name
|
Organism, ortholog and number of studies/entries* for data on increased lifespan
|
Effect on gene function, increased lifespan
|
Conflicting data or dependence on experimental conditions**
|
Number of studies/variants for data on longevity associations
|
Conflicting data or dependence on cohort***
|
Effect of polymorphism on gene expression▴
|
ADRA1A
|
adrenoceptor alpha 1A
|
Mus musculus, Adra1a, 1/1
|
gain of function
|
No
|
1/1
|
No
|
|
AGTR1
|
angiotensin II receptor type 1
|
Mus musculus, Agtr1a, 1/1
|
loss of function
|
No
|
1/3
|
Yes
|
decreased gene expression
|
AKT1
|
AKT serine/threonine kinase 1
|
Mus musculus, Akt1, 1/2
|
loss of function
|
No
|
1/1
|
No
|
decreased protein activity
|
GH1
|
growth hormone 1
|
Rattus norvegicus, Gh1, 2/3
|
loss of function
|
Yes
|
1/1
|
No
|
decreased gene expression
|
GHR
|
growth hormone receptor
|
Mus musculus, Ghr, 5/12
|
loss of function
|
Yes
|
3/10
|
Yes
|
|
GHRHR
|
growth hormone releasing hormone receptor
|
Mus musculus, Ghrhr, 1/1
|
loss of function
|
No
|
3/3
|
Yes
|
|
GRN
|
granulin precursor
|
Mus musculus, Grn, 1/1
|
loss of function
|
Yes
|
1/4
|
No
|
|
HTT
|
huntingtin
|
Mus musculus, Htt, 1/1
|
switch of function
|
No
|
1/2
|
Yes
|
|
IGF1R
|
insulin-like growth factor 1 receptor
|
Mus musculus, Igf1r, 1/2
|
loss of function
|
No
|
4/3
|
Yes
|
decreased gene expression
|
INSR
|
insulin-like growth factor 1 receptor
|
Mus musculus, Insr, 1/1
|
loss of function
|
No
|
1/1
|
No
|
|
IRS2
|
insulin receptor substrate 2
|
Mus musculus, Irs2, 2/5
|
loss of function
|
Yes
|
1/1
|
No
|
|
KL
|
klotho
|
Mus musculus, Kl, 1/4
|
gain of function
|
No
|
3/4
|
Yes
|
|
MSTN
|
myostatin
|
Mus musculus, Mstn, 1/1
|
loss of function
|
Yes
|
1/1
|
No
|
|
NFKBIA
|
NFKB inhibitor alpha
|
Mus musculus, Nfkbia, 1/1
|
gain of function
|
No
|
1/2
|
No
|
increased gene expression
|
PAPPA
|
pappalysin 1
|
Mus musculus, Pappa, 3/4
|
loss of function
|
No
|
1/2
|
No
|
|
PPARG
|
peroxisome proliferator activated receptor gamma
|
Mus musculus, Pparg, 2/2
|
gain of function
|
No
|
2/2
|
Yes
|
|
PTEN
|
phosphatase and tensin homolog
|
Mus musculus, Pten, 1/2
|
gain of function
|
No
|
1/1
|
Yes
|
|
RPS6KB1
|
ribosomal protein S6 kinase B1
|
Mus musculus, Rps6kb1, 1/2
|
loss of function
|
No
|
1/2
|
No
|
|
SIRT1
|
sirtuin 1
|
Mus musculus, Sirt1, 1/3
|
gain of function
|
Yes
|
3/3
|
No
|
increased gene expression
|
SIRT6
|
sirtuin 6
|
Mus musculus, Sirt6, 1/4
|
gain of function
|
No
|
1/1
|
No
|
|
TERT
|
telomerase reverse transcriptase
|
Mus musculus, Tert, 2/3
|
gain of function
|
No
|
4/6
|
Yes
|
decreased gene expression
|
TXN
|
thioredoxin
|
Mus musculus, Txn1, 2/4
|
gain of function
|
No
|
1/1
|
No
|
|
UCP1
|
uncoupling protein 1 (mitochondrial, proton carrier)
|
Mus musculus, Ucp1, 2/8
|
gain of function
|
No
|
1/2
|
No
|
increased gene expression
|
UCP2
|
uncoupling protein 2 (mitochondrial, proton carrier)
|
Mus musculus, Ucp2, 1/2
|
gain of function
|
No
|
3/4
|
No
|
|
VEGFA
|
vascular endothelial growth factor A
|
Mus musculus, Vegfa, 1/2
|
gain of function
|
No
|
1/4
|
No
|
increased gene expression, conflicting data
|
* entries — individual experiments with similar or different line, sex, conditions, genotypes. |
** — considering experiments on mammals only |
*** cohorts — analysis for different ethnicity, sex or age in one study or groups from different studies. |
▴ — if available at least for one variant |
Genes for which the data on the effect of longevity polymorphism on gene expression coincide with the results of experiments on model organisms are in bold. |
We suggest that some of these genes might be potential targets for human aging therapy. A full description of the genes in the highest confidence group, including experiments on non-mammals, description of conflicting results, polymorphism ID’s, and additional data on age-related gene expression changes in humans, is given in the supplementary materials (Table S1).
For nine of these genes, we also collected data on the effect of longevity-associated polymorphisms on the level of gene expression or protein activity. Interestingly, in eight cases, these data agree with the results of experiments on mammalian model organisms. The results of this analysis are presented in Fig. 4.
The groups of high confidence include all genes that increased mammalian lifespan excluding genes that at the same time were associated with human longevity. The moderate group includes genes that have extended the life of non-mammals. Genes that have extended life only in non-mammals represent a wide field for research on their effect on mammalian lifespan.
Only those genes that met at least two of the six medium priority criteria were included in the low confidence group. It is unclear whether these genes affect lifespan, but there is indirect evidence for their association with aging, consequently they might be interesting as novel targets for further lifespan experiments. Genes meeting only one criterion with medium priority or/and criteria with low priority were included in the last group with the lowest confidence. Thus, we received a list of 323 human genes that meet the criteria with high priority or at least two different criteria with medium priority (Fig. 3). However, we collected numerous parameters for evaluating experimental quality and results, therefore, users can apply their own analytical methods and scenarios for selecting aging-related genes.
In addition to research proving the link between gene and aging, we manually collected information about gene evolution and matched genes with hallmarks of aging using GO terms, as described in the following sections.
Data from other databases
We aim to keep all essential genetic data in Open Genes in the available format, therefore Open Genes imports a big share of information about genes from other databases. We import the full gene name, its synonyms, localization, exon and transcript information, and gene function descriptions from the NCBI, MyGene and HUGO databases6,7,8. We parse data on gene conservation and orthologs from the HomoloGene. Orthologs for Caenorhabditis elegans and Drosophila melanogaster we parse from WormBase and FlyBase databases repectfully9,10. Also, we add protein description, its molecular functions, and related biological processes from the UniProt11. Some information is retrieved from the Human Protein Atlas: tissue-specific data on expression levels, protein classes, cell-type and tissue-type specificity, predicted intra- and extracellular localization of the protein12. We take data on biological processes, molecular functions, and cellular components from the Gene Ontology (GO) database13,14. For the correlation of a gene with different aging phenotypes, we use GO terms (details explained in the section “Association of genes with signs of aging and diseases''). Data about disease associations are received from the eDGAR database15.
Impact of gene activity changes on the lifespan and age-related processes
Characterization of the lifespan experiments
In the GeneAge database, there are more than 2000 genes, whose activity alterations affect the lifespan of model organisms, such as mice, worms, fruit flies, and yeast5. We relied on GeneAge database's approach when selecting experiments to add to our database. But we also took into account that the results of the experiments are not always reproducible and may depend on the experimental procedure. We specify 30 to 40 parameters for each manipulation, depending on study design and data availability. All the parameters of the experiments that we provide are described on our website in the section "Open Genes data description” (https://open-genes.com/about/articles/open-genes-data-description). A complex of these parameters allows one to evaluate the quality of the experiment and interpret results more completely.
The lifespans of the same line of an organism can vary between different laboratories. For instance, according to our data the median lifespan of control C57BL/6 mice varies from 510 to 921 days in mixed-sex samples. In some cases the control has initially decreased lifespan and a lifespan increased by genetic manipulation does not overcome the normal life expectancy. Therefore it is important to demonstrate the absolute values of control and experimental animals' lifespan apart from percentage changes. In addition, as discussed below, many interventions lead to inconsistent results in different experiments. Detailed and structured description of all experimental conditions in the Open Genes database grants the possibility to execute automatic analysis with the consideration of multiple parameters, to find reasons for the contradictions in experimental results, to develop the design of experiments, and to search for potential aging therapy targets.
The Open Genes database already contains 2003 entries for lifespan experiments carried out on the 248 human genes which are orthologs of model organisms’ genes, 78 and 112 human genes are orthologs of genes that increase mammalian and non-mammalian lifespan, respectively.
Consistency of the lifespan experiments results
The effect of gene activity modification on the lifespan of a model object is the most direct and obvious evidence of the gene association with life expectancy. We classified these genes into 5 groups based on collected data.
The first group includes genes with consistent data on both loss of function (LF) and gain of function (GF) — suppressing the gene activity always extends the lifespan while overexpressing the gene leads to decreased life expectancy, and vice versa. Activation and repression of these genes are well studied, and the effect of an intervention is predictable.
The second and third groups include genes, for which there are consistent results of several experiments or of a single experiment, respectively, showing that modulation of gene activity increases the life expectancy, herewith the reversal of gene activity has not been studied. The fourth group includes genes with conflicting experimental data — for example, in some cases, LF or GF increased lifespan, in other cases it decreased, or both LF and GF increased or decreased lifespan. Finally, the fifth group includes genes whose activity modification leads to a decrease in the lifespan of model organisms as a result of accelerated aging. We performed this analysis for all organisms, and separately for mammals, flies and worms. The results of the analysis are represented in Fig. 5.
The least controversial results were obtained on mammals (11%), the most — on flies (49%). It is important to note that some of the conflicting results might be explained by differences in experimental conditions, for example, obtained genotypes, sexes or lines of organisms, tissue specificity, etc. Data on lifespan experiments require a deeper analysis in order to reveal true contradictions. Only eight genes (5%) have consistent results of both gain and loss of function experiments on mammals (Table 2).
Table 2
Genes with most consistent and full data on mammalian lifespan experiments
Human gene
|
Lifespan effect in mammals
|
CISD2
|
GF increases, LF decreases
|
KL
|
GF increases, LF decreases
|
PAWR
|
GF increases, LF decreases
|
PPARG
|
GF increases, LF decreases
|
PTEN
|
GF increases, LF decreases
|
SIRT6
|
GF increases, LF decreases
|
UCP2
|
GF increases, LF decreases
|
IKBKB
|
LF increases, GF decreases
|
GF — gain of function, LF — loss of function |
Five of the genes with the most consistent and full data on mammalian lifespan experiments — KL, PPARG, PTEN, SIRT6, UCP2 — were also shown to be associated with human longevity and were included to the “highest confidence” group (Table 1). All human genes divided into five groups depending on consistency of lifespan experiments’ results are shown in supplementary (Table S2).
Effect of genetic manipulation on the age-related processes
Apart from giving detailed data on extension and reduction of life expectancy we also show an additional outcome of intervention: the list of processes that were improved or impaired after genetic manipulation. We also collected data from experiments in which manipulation of the gene affected the age-dependent process, but data on changes in lifespan were not obtained. During the data collection process, we have compiled a list of 50 processes, organs, functions or systems that are considered in publications as indicators of aging or are affected as a result of lifespan changing interventions. At present, the database contains 1421 entries (284 genes) on the effect of gene modification on a particular aging-related alteration.This allows to observe possible mechanisms of genetic manipulation’s effect on the lifespan and widens the perspective in the search for target combinations.
Aging-associated changes in gene expression, methylation, and protein activity
Numerous studies demonstrated that aging is connected to transcriptomic16,17,18 and proteomic19 changes, modifications of gene methylation20,21, alterations in the ratio of protein isoforms22,23,24 and their intracellular localization25,26.
Currently we collected 4672 records on age-related alterations and 2115 genes that were shown to change expression, protein activity or methylation level with age in 29 organisms in 224 tissues, 2965 notes and 1951 genes are human, as we focused primarily on obtaining human data (Figure S1).
We focused on establishing the unified structure of data so that they could be universally used for any research purpose. For this we assigned 13 parameters, essential for evaluating the quality and results of an experiment (https://open-genes.com/about/articles/open-genes-data-description).
Whenever possible, we showed the separate results for each sex and precisely indicated the age of compared groups in order to estimate at which age changes probably occurred. We show results without recalculation but specifying the numerical characteristics and methods for data collection and statistical analysis.
Open Genes can be used for scanning genes, for instance, to determine whether the effect of these genes on lifespan were assessed in model organisms or not yet, whether these genes are associated with some aging symptoms, whether these genes are associated with longevity, and much more (more detailed in Open Genes instruments).
Association of genetic variants and gene expression levels with longevity
Just as with many other phenotypic features, the predisposition to longevity is a partially inheritable multifactorial trait, thus one of the factors increasing longevity chances for the individual must be a certain combination of alleles27. We collected 1458 records on 362 genes, which association with longevity was studied, 312 genes at least once were shown to be significantly associated with longevity or human lifespan. We organized experiments data into the following structure: sample data, study design, study results. The data was structured using 17 experimental parameters (https://open-genes.com/about/articles/open-genes-data-description).
Apart from showing data on longevity association, we also display the available information on whether the given allelic variant is associated with increased or decreased gene expression or protein activity. As an example: the rs1801195 polymorphism in the WRN locus that is associated with increased gene expression may increase the probability of longevity28. The finding that the WRN gene expression reduces, and methylation increases with age in human blood mononuclear cells29 is represented on the page of WRN gene in our database and complement data from the previous study. Data on the effect of longevity polymorphisms on gene expression for genes that have extended life in mammals have been discussed above and are presented in Table 1. Population studies data structured and collected in Open Genes can be useful for performing meta-analyses and determination of the gene therapy targets.
Regulation of genes, associated with aging
While searching for therapy targets, we must take into account that each gene belongs to a complex network of interactions with other genes.
We indicate the direction of the regulator gene activity towards the regulated gene (activation, suppression, or regulation). These activities can be direct or indirect (regulation through interaction with a mediator protein). Open Genes separately marks gene interactions with unknown mechanisms, namely when it is known only that activity changes of one gene affect the level of other gene products. We also indicate particular ways of gene-regulator impact on the regulated gene, such as gene transcription, translation, protein-protein interaction , protein modification, affecting its activity, and others.
Currently, Open Genes contains 642 records on gene interactions. This section in Open Genes is not full yet and is currently being updated. This study type has a low priority as evidence for the relation between gene and human aging in Open Genes. However, these data might be useful for analysis of targets for human aging therapy.
Origin and evolution of genes
For determination of the approximate age of the gene family we used the approach described by Capra et al. in the article “How old is my gene?”30. It is complicated to determine the age of a particular gene because genes keep evolving. Genes don’t have precise age, especially genes with a complicated evolutionary track. The gene age is usually related to an important event in its evolution, such as duplication, horizontal transfer, or de novo origin30.
For studying the genetics of aging, the gene is important within the context of related processes or biological mechanisms. Therefore we considered the age of the gene family as the most evolutionarily ancient taxon which is known to have a homologous gene with a similar function. Currently, Open Genes has 455 manually-collected data on gene evolution. Most gene families have been discovered in metazoans (120), prokaryotes (102), and eukaryotes (101) (Figure S2). We determine the origin of the gene family manually by analyzing studies on molecular evolution. We also indicated the origin of a particular gene, if such data were available. On the gene page there is a comment on the evolutionary age of the gene and gene family, which provides a brief rationale with literature references, and a description of main events in gene evolution. Apart from the manually collected data, we parse sequence conservation data from HomoloGene NCBI6.
Information about the origin and evolution of genes is substantial for understanding the evolution of aging hallmarks, which was described in the recent publication31. Also, consideration of evolution of genes and molecular pathways might be useful for discovering the potential therapeutic targets among aging-associated genes in animal models.
Association of genes with hallmarks of aging and diseases
López-Otín et al. distinguished 9 hallmarks of aging – mechanisms that mutually participate in human aging32. Later, based on these 9 “synthetic” hallmarks Maël Lemoine identified 20 “analytical” hallmarks of aging31. Each hallmark represents a vital biological process that involves multiple associated genes and is disrupted during aging.
We decided to establish the relationship between the genes in our database and biological processes, which dysregulations characterize human aging. We used the Gene Ontology tree of biological processes and molecular functions to link each gene to a particular hallmark of aging. We manually matched each hallmark of aging with the corresponding nodes of the Gene Ontology process and function trees. The hallmark of aging was considered to be associated with a gene if a gene was associated with one or more GO terms corresponding to a tree node or its child branches. Currently, each new gene in Open Genes is automatically assigned to a hallmark or multiple hallmarks of aging based on the GO terms associated with that gene. For example, the TERT gene is associated with the “telomere attrition” aging hallmark because it participates in the following Gene Ontology biological processes: “telomere maintenance” and “telomere maintenance via telomerase” (Figure S3).
Thus, we show putative associations of a gene with a hallmark of aging, based on the gene's involvement in the corresponding biological process. We used a list of 20 aging hallmarks offered by Lemoine31 and completed it with another 2 hallmarks of aging suggested in the recent publication33. A complete list of hallmarks of aging and their corresponding nodes on the Gene Ontology tree is presented in the supplementary (Table S3).
For a database user, a list of diseases linked to individual genes — is probably crucial information. Open Genes gets disease association data from the eDGAR database15. We show associations with disease categories in addition to associations with individual diseases. To do this, we categorized all diseases from the eDGAR database according to the nodes of the ICD-10 tree.
Filters and search
While developing Open Genes we focused on making the search for any genetics-of-aging-related data as simple as possible. The database main table presents the following data for each gene:
-
HUGO symbol and gene name
-
Criteria for adding a gene to the database
-
Confidence levels of the genes
-
Number of entries in each study type
-
Association with hallmarks of aging
-
Association with diseases and disease categories according to ICD-10
-
Protein classification according to Human Protein Atlas
-
The origin of the gene and gene family
-
Conservation according to HomoloGene
The genes in the table are filtered by each parameter individually and by a combination of parameters (Fig. 6). The search by HGNC, name, Ensembl ID, and GO terms is available in Open Genes. The website also has an option to search by the gene list, which allows checking which genes from the user's list are included in Open Genes and what is known about their connection to aging.
There are separate tables for each type of study which can be downloaded by user (https://open-genes.com/download). Any table refers to the gene page, containing all the data about this gene. The page of individual gene contains all the information, both manually collected and added from other databases, including tables with all experiments' details and description of the evolution of the gene.