Open Genes is a resource that brings together a vast amount of manually collected data on experimental evidence for the gene-aging association and general information on genes, their function, and their association with disease parsed from other databases. The overview of Open Genes is presented in Figure 1. The following sections provide a detailed characterization of the data manually collected from the literature and obtained from third party biological databases.
Genes Selection and Classification
Genes Selection Criteria
The Open Genes database aims to collect all available data on the genetics of aging and structure it for analysis and searching for aging therapy targets. All data on experimental evidence of the link between genes and aging are collected manually from the literature.
We collected several types of data regarding each gene. We distinguished 6 types of studies and 12 criteria for adding genes to the Open Genes database:
- Changes in gene activity affect the model organism’s lifespan:
- Changes in gene activity extend the mammalian lifespan
- Changes in gene activity extend the non-mammalian lifespan
- Changes in gene activity reduce the mammalian lifespan
- Changes in gene activity reduce the non-mammalian lifespan
- Age-related changes in gene expression, methylation or protein activity:
- Age-related changes in humans
- Age-related changes in mammals
- Age-related changes in non-mammals
- Changes in gene activity affect the age-related process:
- Changes in gene activity protect against the age-related impairment
- Changes in gene activity enhance the age-related deterioration
- Association of gene variants or expression levels with longevity
- Association of the gene with accelerated aging in humans
- Regulation of genes associated with aging
By “changing the gene activity”, we refer to any specific effect on a gene or its product that affects its transcription, translation, or the activity or stability of its product. A particular intervention method, including gene knockout, RNA interference, additional copies of a gene, therapy with a vector containing a dominant-negative version of the gene, treatment with a protein agonist and others, is indicated for each experiment.
The gene was added to the Open Genes database if it met at least one of the 12 criteria. Figure 2 shows the Open Genes database statistics. Open Genes currently contains 2402 genes associated with aging.
Filters on the main genes table (https://open-genes.com/genes) of the database allow users to generate lists of genes according to selection criteria, combination of criteria and many other parameters.
Ranking the Criteria
Obviously, meeting a single of these 12 criteria is not sufficient enough to establish a link between genes and human longevity. Therefore, we assigned the criteria a high, medium, or low priority (Figure 3).
It is unclear whether changing the activity of genes that decrease lifespan in the opposite direction will lead to an increase in life expectancy. We believe that genetic manipulations that increase lifespan, provide stronger evidence for considering genes as an aging therapy target than manipulations that decrease lifespan. Depending on the results, lifespan experiments were assigned different levels of priority. A high priority was placed on evidence related to the increase in the lifespan of model organisms.
Medium priority was given to data on protection against age-related impairment, the association of a gene with accelerated aging in humans, the decrease in the lifespan of model organisms, age-related changes in gene expression in humans and mammals, and associations of gene variants with longevity. All other criteria, such as deterioration in age-related processes, regulation of other age-related genes and age-related changes in non-mammals, were assigned a low priority.
Division of Aging-Associated Genes to Different Confidence Levels
We divided all genes into five confidence levels: highest, high, moderate, low and lowest (Figure 3). The group of genes with the highest confidence level includes 25 human genes that increase the lifespan of mammals and are known to have variants associated with human longevity.
We proceeded under the assumption that it remains unclear if we can extrapolate the results from model organisms to humans. At the same time, population studies of the genetics of longevity often show conflicting results and only can suggest the influence of a particular gene on the predisposition to longevity. Therefore, we used data on the association with longevity as an additional parameter for genes that extended the mammalian lifespan. Table 1 shows Open Genes data on these 25 genes. We suggest that some of these genes might be potential targets for human aging therapy.
The groups with high confidence level include all genes that increased mammalian lifespan excluding genes that at the same time were associated with human longevity. The moderate group includes genes that have extended the lifespan of non-mammals. Genes that have extended life only in non-mammals represent a wide field for research on their effect on mammalian lifespan.
Only those genes that met at least two of the six medium-priority criteria were included in the low-confidence group. It is unclear whether these genes affect lifespan, but there is indirect evidence for their association with aging; consequently, they might be interesting as novel targets for further lifespan experiments. Genes meeting only one of the criteria with medium priority and/or criteria with low priority were included in the last group with the lowest confidence. Thus, we received a list of 323 human genes that meet the criteria with high priority or at least two different criteria with medium priority (Figure 3). We added a filter that allows sorting genes according to the confidence level of their association with human aging (https://open-genes.com/).
Our classification has limitations. For instance, genes FOXO3 and APOE, whose association with human longevity has been confirmed in numerous population studies, are not in the “highest” and “high” groups due to the lack of experimental evidence demonstrating their ability to increase mammalian lifespan. Downregulation of another well-known gene, mTOR, increased the lifespan of all studied model organisms (worms, flies and mice), but no statistically significant associations of this gene with human longevity were found. Of course, this does not rules out the possibility that mTOR is associated with human longevity, and this gene was assigned to the group of genes with a high confidence level. Thus, genes with the highest confidence level are a special group of genes that simultaneously meet two parameters. They extended the lifespan of animals and have been shown to be associated with human longevity.
However, we collected numerous experimental parameters and provided various filters for the genes; therefore, users can apply their own analytical methods and scenarios for selecting aging-related genes.
Open Genes Data
Impact of Gene Activity Changes on the Lifespan And Age-Related Processes
When collecting data on the effects of genetic manipulations on model organisms’ lifespans, we accounted for the fact that experiment outcomes are dependent on the experimental procedure. Depending on study design and data availability, up to 35 parameters for each manipulation are specified, including absolute values of lifespans of control and experimental groups, tissue specificity of an intervention, size of control and experimental groups, and the animals’ maintaining conditions (temperature, diet, number of animals in a container, cage, or plate). All the parameters of the experiments are described on our website in the section "Open Genes data description” (https://open-genes.com/about/articles/open-genes-data-description). A combination of these parameters permits a more comprehensive interpretation of results.
The lifespans of the same line of an organism can vary between different laboratories. For instance, according to our data, the median lifespan of control C57BL/6 mice varies from 500 plus days to more than 900 days in mixed-sex samples7,8. In some cases, the control has initially shorter lifespan and an increase in lifespan through genetic manipulation does not overcome the normal life expectancy. Therefore, it is important to demonstrate the absolute values of control and experimental animals' lifespans apart from percentage changes. In addition, as will be discussed below, many interventions lead to inconsistent results in different experiments. A detailed and structured description of all experimental conditions in the Open Genes database enables comparison of lifespan experiments and a more accurate interpretation of the results.
The Open Genes database already contains 1975 entries for lifespan experiments related to the 247 human genes that are orthologs of model organisms’ genes, and 78 and 108 human genes are orthologs of genes that increase mammalian and non-mammalian lifespan, respectively. The structured data from the lifespan experiments are available on the Open Genes download page (https://open-genes.com/download).
In addition to providing detailed data on the extension and reduction of life expectancy we also provide an additional outcome of intervention: the list of processes that were improved or impaired after genetic manipulation. We also collected data from experiments in which manipulation of the gene affected the age-dependent process, but data on changes in lifespan were not obtained. During the data collection process, we have compiled a list of 50 processes, organs, functions, or systems that have been cited in the literature as indicators of aging or are affected as a result of lifespan-changing interventions was compiled. Currently, the database contains 1390 entries (273 genes) on the effect of gene modification on a particular aging-related alteration.This allows for the observation of possible mechanisms of genetic manipulation’s effect on the lifespan and widens the perspective in the search for target combinations. The data are available for download at https://open-genes.com/download.
Aging-Associated Changes in Gene Expression, Methylation, and Protein Activity
Numerous studies have demonstrated that aging is connected to transcriptomic9,10,11 and proteomic12 changes, modifications of gene methylation13,14, and alterations in the ratio of protein isoforms15,16,17 and their intracellular localization18,19.
Currently, we have 4644 records on age-related alterations and 2116 genes that were shown to change expression, protein activity, or methylation level with age in 29 organisms and 224 tissues. Of these, 2965 notes and 1951 genes are human, as we focused primarily on obtaining human data.
We focused on establishing a unified data structure so that it could be universally utilized for all research purposes. For this, 13 parameters were assigned, which are essential for evaluating the quality and results of an experiment (https://open-genes.com/about/articles/open-genes-data-description).
When possible, separate results for each sex were shown along with the precise ages of the compared groups to estimate the ages at which changes most likely occurred. We present results without recalculation but specify the numerical characteristics and methods for data collection and statistical analysis.
Open Genes can be used for scanning genes, for instance, to determine whether the effect of these genes on lifespan has been assessed in model organisms or not yet, whether these genes are associated with some aging symptoms, and whether they are associated with longevity (more detailed in Open Genes instruments). The data are available on the Open Genes download page (https://open-genes.com/download).
Association of Genetic Variants And Gene Expression Levels with Longevity
As with many other phenotypic features, the predisposition to longevity is a partially inherited multifactorial trait; thus, one of the factors increasing the individual’s longevity chances must be a certain combination of alleles20. We collected 1458 records on 362 genes, whose association with longevity was studied; at least 312 genes were shown to be significantly associated with longevity or human lifespan. We organized the experiment data into the following structures: sample data, study design, and study results. The data were structured using 17 experimental parameters (https://open-genes.com/about/articles/open-genes-data-description).
In addition to showing data on longevity association, we also show the available information on whether the given allelic variant is associated with increased or decreased gene expression or protein activity. For example, the rs1801195 polymorphism in the WRN locus that is associated with increased gene expression may increase the probability of longevity21. The finding that WRN gene expression reduces and methylation increases with age in human blood mononuclear cells22 is presented on the page for the WRN gene in our database and complements data from the previous study. Data on the effect of longevity polymorphisms on gene expression for genes that have extended life in mammals have been discussed above and are presented in Table 1. Population study data structured and collected in Open Genes can be useful for performing meta-analyses and determining of the gene therapy targets. The structured data from population studies of longevity are available on the Open Genes website (https://open-genes.com/download).
Origin and Evolution of Genes
Information about the origin and evolution of genes is essential for understanding the evolution of aging hallmarks, which was described in the recent publication23. We utilized the method described by Capra et al. in their article “How old is my gene?” to estimate the approximate age of the gene and gene family24. It is difficult to determine the age of a particular gene, as genes are constantly evolving. Genes do not have precise ages, especially genes with a complex evolutionary history. The gene age is usually associated with an important event in its evolution, such as duplication, horizontal transfer, or de novo origin24.
The gene is essential for studying the genetics of aging in the context of related processes or biological mechanisms. Therefore, we considered the age of the gene family based on the most evolutionarily ancient taxon that is known to possess a homologous gene with a similar function. Currently, Open Genes has 455 genes with manually-collected data on their evolution. Most gene families have been discovered in metazoans (120), prokaryotes (102), and eukaryotes (101). We manually determined the origin of the gene family by analyzing studies on molecular evolution. If such information was available, we also indicated the origin of each gene. On the gene page, there is a comment on the evolutionary age of the gene and gene family, which provides a brief rationale with literature references, as well as a description of the main events in gene evolution. In addition to the manually collected data, sequence conservation data were parsed from HomoloGene NCBI6. The data are available at https://open-genes.com/download.
Consideration of the evolution of genes and molecular pathways might be useful for discovering the potential therapeutic targets among aging-associated genes in animal models.
Association of Genes with Hallmarks of Aging
López-Otín et al. distinguished nine hallmarks of aging, or mechanisms that mutually participate in human aging25. Based on these nine “synthetic” hallmarks, Maël Lemoine identified 20 “analytical” hallmarks of aging23. In a recent publication, López-Otín et al. described the nine hallmarks of aging in greater detail and added three new ones26. Each hallmark represents a vital biological process that involves multiple associated genes and is disrupted during aging.
We chose to establish a connection between the genes in our database and biological processes, whose dysregulations characterize human aging. We used the Gene Ontology tree of biological processes and molecular functions to link each gene to a particular hallmark of aging. Each hallmark of aging was manually matched with the corresponding nodes of the GO process and function trees. Currently, each new gene in Open Genes is automatically assigned to a hallmark or multiple hallmarks of aging based on the GO terms associated with that gene. For example, the TERT gene is associated with the “telomere attrition” aging hallmark since it participates in the GO biological processes “telomere maintenance” and “telomere maintenance via telomerase”. Genes can be filtered by hallmarks of aging on the Open Genes homepage https://open-genes.com/.
Thus, putative associations between a gene and a hallmark of aging are shown based on the gene’s involvement in the corresponding biological process. We considered 20 aging hallmarks offered by Lemoine23, added “disabled macroautophagy” from the last review by López-Otín et al.26, and completed this list with two additional hallmarks of aging suggested in the recent publication of Pun F. W. et al.27 A complete list of hallmarks of aging and their corresponding nodes on the GO tree is presented in Supplementary Table S1.
Filters and Searching
During the development of Open Genes, we aimed to simplify the search for any data pertaining to the genetics of aging. The main table of the database provides the following information for each gene:
- HUGO symbol and gene name
- Criteria for adding a gene to the database
- Confidence levels of the genes
- Number of entries in each study type
- Association with hallmarks of aging
- Association with diseases and disease categories according to ICD-10
- Protein classification according to the Human Protein Atlas
- The origin of the gene and the gene family
- Conservation according to HomoloGene NCBI
The genes in the table are filtered by each parameter individually and by a combination of parameters (Figure 4). The search by HGNC, name, Ensembl ID, and GO terms is available in Open Genes. The website also provides an option to search by the gene list, allowing users to determine which of the genes on their list are included in Open Genes and what is known about their connection to aging.
The user can download separate tables for each type of study from https://open-genes.com/download. The page for each gene contains all the information, both manually collected and added from other databases, including tables with all experiments’ details and a description of the gene’s evolution.
Data from Other Databases
Open Genes imports a significant amount of information about genes from other databases in an effort to maintain all essential genetic data in the most accessible format. General data for the genes are automatically obtained monthly by parsing from third-party database APIs (https://open-genes.com/about/articles/reference-list). We import the gene name, its synonyms, transcript information, as well as gene description, orthologs in vertebrates, and conservation from the NCBI, NCBI HomoloGene, and MyGene databases28,29,30. Gene location on chromosomes is obtaining from HUGO database31. Orthologs for Caenorhabditis elegans and Drosophila melanogaster are extracted from WormBase and FlyBase databases, respectfully32,33. In addition, the protein description is included from UniProt34. Some information is retrieved from the Human Protein Atlas, including tissue-specific data on gene expression levels, protein classes, cell-type, and tissue-type specificity, and the predicted intracellular and extracellular localization of the protein35. We use the GO database to retrieve information regarding biological processes, molecular functions, and cellular components36,37. To link a gene with different hallmarks of aging, we use the GO terms tree provided by QuickGO REST APIs38. Data about disease associations are obtained from the eDGAR database39. We obtain ICD codes for diseases associated with genes and the ICD diseases tree from the Orphanet database and ICD-1140,41.
Analyzing Open Genes Data
Using the Open Genes data, we analyzed the consistency between lifespan experiments performed on mammals and population studies of human longevity. We also classified the genes according to the degree of inconsistency in the results of lifespan experiments on different animal models. By using similar scenarios to analyze the experimental data that we have collected, users can obtain lists of genes of interest that correspond to certain parameters.
Effect of Gene Activity on Mammalian Lifespan and Longevity in Humans
For 25 genes with a high confidence level (Table 1), we analyzed the effect of polymorphism associated with human longevity on gene expression, we collect such data if it is available. The results of this analysis are presented in Figure 5.
For 10 of the “highest confidence level” genes, there are data on the effect of longevity-associated polymorphisms on the level of gene expression or protein activity. Interestingly, in eight instances, these data agree with the results of experiments on mammalian model organisms. Centenarians are carriers of variants of genes associated with increased gene expression, and upregulation of these genes extends the lifespan of animals (NFKBIA, SIRT1, UCP1, and VEGFA). Conversely, genes that must be suppressed for animal longevity are downregulated in centenarians (AGTR1, AKT1, GH1, and IGF1R).
Thus, using a variety of parameters specified for each gene and experiment in the database, users can narrow down groups of genes to a few genes and isolate special genes of interest.
Consistency of the Lifespan Experiments Results
The effect of gene activity modification on the lifespan of a model object is the most direct and obvious evidence of the gene association with life expectancy. Using Open Genes data we classified genes into six groups based on the effect of loss (LF) or gain (GF) of gene function on the lifespan of model organisms. By “loss of gene function”, we refer to the interventions that decrease gene expression or gene product activity, such as gene knockout or knockdown, RNA interference or therapy with a specific gene product inhibitor. “Gain of function” includes overexpression of a gene with additional copies of the gene or therapy with a vector expressing the given gene, a mutation that increases the stability or activity of the gene product, and therapy with specific inducers of the gene product.
The first group includes genes with consistent data on both LF and GF: suppressing the gene activity always extends the lifespan, whereas overexpressing the gene leads to decreased life expectancy, and vice versa. Activation and repression of these genes are well studied, and the effect of an intervention is predictable.
The second and third groups include genes for which there are consistent results from several experiments or a single experiment, respectively, indicating that gene activity modulation increases life expectancy; however, the reversal of gene activity has not been studied. The fourth group includes genes with conflicting experimental data; for example, in some cases, LF or GF increased lifespan, while in others it decreased. The fifth group includes genes whose activity modification leads to a decrease in the lifespan of model organisms as a result of accelerated aging. Finally, genes were included in the sixth group if any change in their activity, either a loss or gain of function, reduced the lifespan of model organisms. It can be assumed that any modification of such genes is not beneficial since their activity is already optimal for longevity; therefore, we called this group of genes "optimal". We performed this analysis for a combined group comprised of all organisms, as well as separately for mammals, flies, and worms. The results of the analysis are represented in Figure 6.
The least controversial results were obtained on mammals (9.8%), and the most controversial results were obtained on flies (45.1%). It is important to note that some of the conflicting results might be explained by differences in experimental conditions, for example, obtained genotypes, sexes, or lines of organisms, tissue specificity, etc. Data on lifespan experiments require a deeper analysis to reveal true contradictions. Only eight genes (5%) have consistent results in both gain and loss of function experiments on mammals (Table 2).
Five of the genes with the most consistent and full data on mammalian lifespan experiments, KL, PPARG, PTEN, SIRT6, and UCP2, were also shown to be associated with human longevity and were included in the “highest confidence” group (Table 1). All human genes classified into six groups based on the consistency of the lifespan experiment results are presented in Supplementary Table S2.
The ratio of consistent and conflicting results allows us to determine which model organisms are more suitable for lifespan experiments. Most of the conflicting results were obtained in experiments with flies. This may indicate that the lifespan of flies is highly dependent on various factors, many of which can be difficult to control. A deeper analysis of contradictions, taking into account such parameters as genotype, sex, temperature, tissue specificity, and others, can be useful to determine what factors affect the results of the experiment.