2.1 Overall growth of database
MethMotif 2023 currently integrate a large number of ChIP-seq and WGBS datasets from various sources, including ENCODE (Davis et al., 2018), GEO (Clough and Barrett, 2016), and GTRD (Yevshin et al., 2019), which further completed with in-house experiments, reaching a total of 702 methyl-PWMs for three taxonomic groups. Regarding Homo Sapiens, sixteen cell lines are covered, including five new ones (H9, HEK293T, HUES64, LNCaP, SNU398), which enable us to better characterize cell-type specific TFs’ function (Table 1).
Table 1
Expansion overview of MethMotif 2022 compared to the previous release
| Release | Human | Mouse | Arabidopsis |
Number of TFs | 2018 | 509 | 0 | 0 |
| 2022 | 655 | 24 | 23 |
Cell lines/Tissues | 2018 | 11 | 0 | 0 |
| 2022 | 16 | 1 | 1 |
ChIP-seq experiments | 2018 | 2178 | 0 | 0 |
| 2022 | 2473 | 78 | 74 |
WGBS experiments | 2018 | 16 | 0 | 0 |
| 2022 | 22 | 1 | 10 |
2.2 Inclusion of binding motifs for transcriptional cofactors
Cooperative binding of TFs plays an essential role in the regulation of gene expression. Previous studies have shown that TF–TF cooperativity largely depends on various factors, such as DNA motif, methylation, orientation, and spacing preferences (Jolma et al., 2015). However, to date, there is no systematic approach to characterize TFs binding partnership based on DNA methylation status, nor to predict the effect of TF binding modules on the CpGs methylation level. To address this problem, we have developed a novel tool exploiting our data compendium to reveal and rank the list of cofactors co-binding with the specific TF of interest (Lin et al., 2020). For each human transcription factor, MethMotif 2023 includes a cofactor report that summarizes the TF-binding partnership in four categories: i) co-binding score, ii) the binding motif of co-binding TFBS, iii) the methylation status together with the CG percentages within the two motifs as well as within the 200 bp region surrounding the motif, and iv) the read enrichment score (Fig. 1A). Furthermore, a detailed cofactor MethMotif logo displays the methylation level of each base pair within the binding site, together with the gene location and a gene ontology annotation for each TF-TF pair to characterize each cooperation (Fig. 1B). Finally, global genomics location distribution (Fig. 1C) as well as TF targets’ GO enrichment (Fig. 1D) generated by HOMER (Heinz et al., 2010) and GREAT (McLean et al., 2010) respectively, are provided for each TF and TF-TF pair.
2.3 Introduction of CHG and CHH methylation contexts in Arabidopsis Thaliana
In contrast to mammalian cells, plants carry DNA cytosine methylation regardless of the context. More precisely, here, DNA methylation can occur within symmetrical CG and CHG (where H stand for A, T, or C) di- and tri-nucleotides respectively (ie. cytosine methylation can be present on both strands) and asymmetrical CHH tri-nucleotide (ie. cytosine methylation can be present only on one strand) (Grimanelli and Ingouff, 2020). Considering the distinct DNA methylation patterns in plants compared to other taxonomic groups, we refined the original MethMotif logo to enable the distinction between a/symmetrical or strand-specific methylation (Fig. 2).
2.4 Online tools for batch querying and data visualization
The batch query function enables the user to search the whole database for the occurrences of TFBSs together with their respective methylation status within a bed file. The new functionality returns the global TFBS and context-specific partner information for the chosen cell line and enriched transcription factors. In order to ease interpretation, the tool produce selected information and figures for the provided genomic locations. Downloading the results requires a single click. Ultimately, the batch query constitutes a robust bioinformatics tool, considering context-specific transcription factor information, which is accessible to all scientists.
Researchers looking for a more customizable analysis or more comfortable with R can drill down further into a sample of interest with the API TFregulomeR (Lin et al., 2020). This R package can access the entirety of the MethMotif database, and to generate all the figures and files available through the MethMotif website (with internet access). Starting with the genomic information of interest, it is possible to analyze the TFBS between cell lines and/or in different CpG contexts. TFregulomeR further enables the deconvolution of TFBS by segregating a list of the cofactors most likely binding the protein of interest. Finally, TFregulomeR streamlines efficient annotation of context-specific data with genomic location and GREAT gene ontology information. Users can download this package through Github (https://github.com/benoukraflab/TFregulomeR) and begin exploring its many functions.