Evolution of Lysine-Speci c Demethylase 1 and REST Corepressor Gene Families and Their Molecular Interaction


 Lysine-specific demethylase 1A (LSD1) binds to RCOR gene family of corepressors to erase transcriptionally active marks on histones. Functional diversity in these complexes depends on the type of RCOR included, which modulates the complex´s catalytic activity. We studied the duplicative history of RCOR and LSD gene families, and analyzed the evolution of their interaction. We found that RCOR genes are the product of the two rounds of whole-genome duplications that occurred early in vertebrate evolution. In contrast, the origin of the LSD genes traces back before to the divergence of animals and plants. Coimmunoprecipitation experiments using resurrected RCOR and LSD1 proteins of the jawed vertebrate ancestor, and the common hop, date the origin of LSD1-RCOR interaction to the ancestor of animals, fungi, and plants. Overall, we trace LSD1-RCOR complex evolution and propose that animal, fungi, and plant non-model species offer advantages in addressing questions about the molecular biology of this epigenetic complex.


Introduction
Lysine-specific demethylase 1A (LSD1, KDM1A, BHC110, AOF2) is an epigenetic enzyme that represses gene expression by erasing transcriptionally permissive histone modifications (Shi et al. 2004;Forneris et al. 2005). LSD1 function and stability depend on forming a stable complex with a member of the RCOR family of transcriptional corepressors (Lee et al. 2005;Shi et al. 2005;Forneris et al. 2007). Mammalian LSD1 has a central SWIRM (Swi3p, Rsc8p, and Moira) domain and a C-terminus amino oxidase (AOD) catalytic domain. The AOD domain is interrupted by an alpha-helical tower domain of 92 amino acids that allows the interaction of LSD1 with the RCOR family of proteins, composed in mammals by RCOR1 (CoREST, CoREST1), RCOR2, and RCOR3 (Yang et al. 2006;Forneris et al. 2007; Á.P. . RCOR proteins share a characteristic structure, including three functional domains. An ELM2 (homology 2 of Egl-27 and MTA1) domain, followed immediately by a SANT (Swi3,Ada2,and TFIIIB) domain. A second SANT domain localized at the C-terminus of RCOR proteins allows the interaction with nucleosomal DNA (Pilotto et al. 2015). The interSANT sequence or linker domain, sufficient for binding LSD1 (Shi et al. 2005), hugs the tower domain (Yang et al. 2006; A.P. . Although the three RCORs interact with LSD1, the catalytic properties of the different complexes differ and functionally have been associated with different biological processes. For example, RCOR1 regulates differentiation into various cell lineages (Ballas et al. 2005;Yao et al. 2014;Xiong et al. 2020a) and represses the expression of viral genomes (Zhou et al. 2013), while RCOR2 maintains pluripotency and proliferation of embryonic stem cells (Yang et al. 2011). Although, both have relevant roles in central nervous system development (Wang et al. 2016;Monaghan et al. 2017). Unexpectedly, the RCOR3 study has lagged behind with no associated functions to date.
Regarding the evolution of these gene families, not much is known, although studies have reported distinct evolutionary patterns in plant and animal LSD proteins (Zhou and Ma 2008). Today, the availability of whole-genome sequences in a wide range of taxonomic groups opens an outstanding opportunity to shed light on the evolution of 4 gene families. Understanding the duplicative history of gene families is required, among other things, to make biologically meaningful comparisons. Although several questions can be asked by analyzing the gene repertoire in representative species of a given taxonomic group, there are questions related to the LSDs and RCORs that require special attention. For example, when did the expansion of the RCOR repertoire happen? Are they the product of the whole-genome duplications or loci-specific duplications? What is the conservation pattern of the different domains of the LSD and RCOR genes? Is LSD1-RCOR interaction restricted to metazoans?
In this work, we studied the evolutionary history of the LSD and RCOR gene families in animals. Our results suggest that RCOR genes are ohnologs and that their diversification occurred in the ancestor of jawed vertebrates, while the origin of LSD paralogs is much more ancient. According to their phyletic distribution, LSD1 and RCOR are widespread in metazoans, and are also found in fungi and plants. Our experiments resuscitating the RCOR and LSD1 proteins present in the ancestor of jawed vertebrates indicate that the LSD1-RCOR interaction precedes the RCOR repertoire expansion.
Furthermore, our co-immunoprecipitation experiments using the common hop as a model system suggest that the origin of the LSD1-RCOR interaction traces back to the common ancestor of animals, fungi, and plants. 5

Results and discussion
The RCOR gene repertoire expanded in the ancestor of jawed vertebrates.
To understand the duplicative history of the RCOR genes, we reconstructed gene phylogenies with different taxonomic samplings. The first analysis aimed to understand the evolution of RCOR genes in vertebrates (Fig. 1), whereas in the second, our sampling effort included representative species of all main groups of animals (Fig. 2).
In the first analysis, our maximum-likelihood tree recovered well supported clades corresponding to RCOR1 sequences from vertebrates and RCOR2, and RCOR3 sequences of jawed vertebrates (i.e., gnathostomes) (Fig. 1). In this tree, the clade containing RCOR1 sequences was recovered sister to RCOR3 clade; however, this relationship is not supported (Fig. 1). The clade containing RCOR2 sequences from jawed vertebrates was recovered sister to the RCOR1/RCOR3 clade (Fig. 1). Our analysis recovered a clade containing RCOR sequences from two jawless vertebrates (i.e., cyclostomes) species (inshore hagfish and sea lamprey) sister to the RCOR1 clade from jawed vertebrates (Fig. 1). This topology suggests different evolutionary scenarios: 1) the RCOR genes diversified in the vertebrate ancestor, and jawless vertebrates only retained one copy (RCOR1) or 2) the RCOR genes diversified in the ancestor of jawed vertebrates and the sister group relationship recovered in our gene tree is a phylogenetic artifact. This last scenario could be possible, given that resolving orthology between jawless and jawed vertebrates is a complex evolutionary problem because of the compositional biases of the former group (Qiu et al. 2011;Smith et al. 2013). Additionally, resolving phylogenetic relationships among vertebrates needs a taxonomic sampling that includes more than just vertebrate species.
To further understand if the RCOR genes diversified in the ancestor of vertebrates or jawed vertebrates, we performed a phylogenetic analysis extending our sampling to representative species of all main groups of animals and reducing the representation of vertebrates. In addition to showing the presence of a single copy gene in all major groups of animals other than vertebrates, our phylogenetic tree resolves the sister group relationship between jawless and jawed vertebrates (Fig. 2), and it is consistent with our second proposed scenario (Fig. 2). We recovered the monophyly of the vertebrate clade 6 containing RCOR sequences with strong support (100/1/100, Fig. 2). The clade containing RCOR sequences from jawless vertebrates was recovered sister to the group containing the RCOR1, RCOR2, and RCOR3 clades from jawed vertebrates (Fig. 2), suggesting that the diversification of the RCOR genes occurred between 615 and 473 million years ago (Kumar et al. 2017) in the ancestor of jawed vertebrates, after the divergence from jawless vertebrates. Among jawed vertebrates, our gene tree recovered the sister group relationship between RCOR1 and RCOR2, while the RCOR3 clade was recovered sister to the RCOR1/RCOR2 clade (Fig. 2).
It is widely accepted that the evolution of vertebrates was shaped by ancient whole-genome duplications (WGDs) (Meyer and Schartl 1999;McLysaght et al. 2002;Dehal and Boore 2005;Hoegg and Meyer 2005;Putnam et al. 2008). The most accepted hypothesis invokes two rounds of WGD during the evolutionary history of vertebrates; however, the timing of these duplication events is still a matter of debate (Simakov et al. 2020;Nakatani et al. 2021). Genes that originated as results of WGDs are called ohnologues, in honor of Susumu Ohno, who was the first to propose the occurrence of two rounds of WGDs early in the evolution of vertebrates (Ohno 1970). The expansion of RCOR genes in jawed vertebrates suggests that they could be the result of the WGDs occurred during the evolution of vertebrates. After checking the repository of genes retained from WGDs in the vertebrate genomes (Singh and Isambert 2020), we confirmed that the RCOR genes are indeed the product of the vertebrate specific WGDs. Based on this evidence, we propose onhologs as the appropriate term to describe the homology relationship among them.
The role of each of the RCOR proteins is not yet clear. RCOR1 plays a specific role in keeping viral genomes in latency in neurons (Roizman et al. 2011). Studies in RCOR1 null mice reveal a crucial role of this protein in erythropoiesis and the proliferation of regulatory T cells (Yao et al. 2014;Xiong et al. 2020b). On the other hand, RCOR2 is significantly expressed in embryonic stem cells, regulating their proliferation and pluripotency (Yang et al. 2011). In the case of RCOR3, there is no evidence regarding its role in mammalian physiology and development. Together with other reports, this 7 evidence highlights the contribution of WGDs to functional diversification among RCOR onhologs, supporting the pivotal role of WGDs in the origin of biological novelties.

LSD1 and LSD2 were present in the ancestor of all animals
Given the specificity of the interaction of RCOR proteins exclusively with LSD1 but not with its paralog Lysine-specific demethylase 1B (LSD2, KDM1B), we sought to investigate the evolutionary history of the LSD gene family in animals. To do so, we performed two phylogenetic analyses. In the first, we included representative species of the main groups of vertebrates (Fig. 3), whereas, in the second, our sampling effort expanded to the main groups of animals (Fig. S1).
In our first maximum likelihood tree, we recovered the monophyly of LSD1 and LSD2 genes from vertebrates ( Fig. 3), suggesting that the ancestor of the group, which existed between 676 and 615 million years ago (Kumar et al. 2017), had both paralogs.
To further understand the origin of these two genes, we analyzed their duplicative history now by including representative species of all main groups of animals. Our phylogenetic analyses also recovered the monophyly of each paralog, LSD1 and LSD2, suggesting that the ancestor of all animals had both paralogs in their genome (Fig. S1). Thus, our results are consistent with previous reports suggesting an early origin of LSD genes prior to the divergence of animals and plants (Zhou and Ma 2008).

Phyletic distribution of RCOR and LSD genes
To better describe the evolutionary history of RCOR and LSD genes in animals, we analyzed their phyletic distribution, i.e., their presence and absence in different animal groups. RCOR1, RCOR2, and RCOR3 are present in all major groups of vertebrates other than cyclostomes, which possess a single gene copy (named RCOR1/2/3) ( Fig. 1,   2, and 4A). Interestingly, although RCOR1 and RCOR3 are present in all examined bird species, RCOR2 is restricted to species belonging to the orders Psittaciformes, Passeriformes, Accipitriformes, and Anseriformes (Table S1) (Table S1), we suggest that the RCOR2 gene was lost independently in different bird lineages. RCOR1 and RCOR2 play preponderant roles in the development of the cerebral cortex in mammals. Mice null for RCOR2 early in neural progenitor cells show decreased neocortex thickness and brain size (Wang et al. 2016), whereas knocking down RCOR1 in later stages of development alters the differentiation and migration of cortical neurons (Fuentes et al. 2012). Although there are no studies focused on the functional roles of RCORs in birds, it would be interesting to compare the formation of the dorsal telencephalon between birds with the full complement of RCOR genes with those lacking RCOR2.
Our analysis shows that there are more species where RCOR proteins co-exist with LSD1 than species with only one of them (Table S2 and S4). However, there are some exceptions. For instance, the water bear (Ramazzottius varieornatus) possesses LSD1 but not RCOR. Bearing in mind that identifying a gene does not necessarily mean that the protein is expressed, the water bear example would imply that LSD1 is functionally independent of RCOR in some animal lineages. This species and others could provide a suitable platform to investigate RCOR (or LSD1) functions that are LSD1 9 (or RCOR)-independent to further understand the diverse range of biological processes in which these proteins participate.

Mammals and turtles possess the regulatory sequences to express the LSD1-8a specifically in neurons
The LSD1  E8a microexon inclusion occurs during neuronal differentiation and enhances neurite morphogenesis in mammals (Zibetti et al. 2010). Further research has demonstrated a role for neuroLSD1 in modulating behavior (Rusconi et al. 2016;Wang et al. 2016).
NeuroLSD1-KO mice display less anxiety in various behavioral tests (Rusconi et al. 2016), a diminished response to epileptogenic stimuli (Rusconi et al. 2016), and defects in spatial learning and memory (Wang et al. 2015;Longaretti et al. 2020). When discovered, neuroLSD1 was described as a mammalian-specific protein (Zibetti et al. 2010), although the same microexon was later identified in turtles (Wang et al. 2015), and an 8a-like exon was also described in zebrafish (Tamaoki et al. 2020).
Due to E8a's relevance in neuronal development and behavior, and the lack of a systematic study of the E8a phyletic distribution, we analyzed the presence of the E8a sequence at the LSD1 gene in representative species of all main groups of vertebrates.
To annotate E8a sequences, we selected the intronic region between exons 8 and 9, using the human LSD1 genomic sequence as reference. We considered as E8a, sequences aligning with human (Homo sapiens) E8a (Zibetti et al. 2010) or zebrafish (Danio rerio) E8a-like (Tamaoki et al. 2020), including conserved splice donor and acceptor sites (GT and AG, respectively). Our analyses confirmed that the E8a microexon (i.e., DTVK) is also present in turtles (Fig 4A), consistent with previous reports (Zibetti et al. 2010;Wang et al. 2015). On the other hand, crocodiles, amphibians, lizards, and snakes do not have the E8a sequence. Interestingly, the intronic region of these species is shorter than those in mammals and turtles (mean of 1537.5 bp in crocodiles, 10 amphibians, lizards, and snakes, whereas 9587.2 bp in mammals and turtles (Table S5).
In birds, we found the E8a sequence only in species belonging to the orders of Phaethontiformes, Gaviiformes, Opisthocomiformes, and Accipitriformes ( Fig. 4A and Table S5). We found a six amino acids coding sequence aligning with the E8a-like exon in bony fish, cartilaginous fish, and cyclostomes, previously described in zebrafish (Table   S5) (Tamaoki et al. 2020). Thus, the microexon E8a appears to have been present in the vertebrate ancestor and lost multiple times as well as diverged (e.g., mammal vs. fish sequence) during the radiation of the group.
The E8a retention into LSD1 transcripts is regulated by three splicing factors (NOVA1, FUBP, and SRRM4). A palindromic sequence located ~300 bp towards the 3´ of E8a can trap the exon and its donor and acceptor splicing sites into a double-stranded RNA structure (Rusconi et al. 2015;Hwang et al. 2018). SRRM4 binds the UGCUGC motif upstream of the splice acceptor site of the exon E8a (Hwang et al. 2018), and together with NOVA1 and FUBP, they form a complex that can maintain a single-stranded pre-mRNA and therefore elicit exon E8a retention. We found that the palindromic sequence and the UGCUGC motif (from now, splicing regulatory sequences (SRS) are absent in most vertebrates, other than mammals and turtles (Fig. 4A). Given the presence of E8a and the absence of SRS in most vertebrate groups, we hypothesize that the expression of LSD-8a is not restricted to neuronal tissue in these animals. This finding agrees with previous results in zebrafish in which LSD1-8a is ubiquitously expressed (Tamaoki et al. 2020 Upadhyay et al. 2014;Pilotto et al. 2015), suggesting that expansion of the RCOR repertoire as a product of the whole-genome duplications that occurred early in the evolution of vertebrates have extended the functional capabilities of the LSD1-RCOR complexes.
To delve into the evolution of the RCOR and LSD1 molecular interaction, we Low conservation of the linker domain in RCORs and tower domain in LSD1 was somewhat surprising given their role in forming the molecular complex. We analyzed the conservation of amino acids key to the interaction to get insight into this intriguing aspect.
To this end, we carried out a comparative analysis between RCOR1 and RCOR3 based on their 3D structures in complex with LSD1 (Yang et al. 2006;Hwang et al. 2011; Á.P. . Using this information, we searched for the conservation of those key amino acids in equivalent positions in RCOR2. Interaction interfaces of LSD1-RCOR1 and LSD1-RCOR3 complexes were divided into four segments, I to IV (Fig. S2). In segment I, two salt bridges are formed between two Asp residues of RCOR1 and two Lys 12 residues of LSD1. These residues are in the same position in RCOR3 forming equivalent salt bridges with LSD1. Segment II contains two salt bridges and one hydrogen bond between RCOR1 and LSD1. These are also conserved and in close contact in the LSD1-RCOR3 complex. In segment III, a Lys in position 371 in RCOR3 substitutes the Arg in RCOR1 for the salt bridge with Asp495 of LSD1. Section IV is not essential for LSD1-RCOR1 binding (Shi et al. 2005) but might aid LSD1s' indirect interaction with the DNA or the histone octamer (Kim et al. 2020). In segment IV, Lys397, Asp401, and Asp407 are adequately located to form two additional salt bridges between RCOR1-LSD1 and RCOR3-LSD1 (Fig. S2).
The residues mentioned above are either conserved or substituted by chemically equivalent amino acids in RCOR2, except for Asp320, which is replaced with a Gly (Fig.   S2). High conservation of key amino acids in relevant positions for LSD1 interaction in the RCOR proteins (highlighted in orange in figure 5) indicates that they evolved before the repertoire expansion and have remained that way for the last 615 million years. On the other hand, we hypothesize that tower and linker domains variability has made possible the flexible encounter of different RCOR with LSD1 and the modulation of the catalytic activity that these exert on LSD1.
Residues in most conserved positions in the primary structure of RCOR1 and

RCOR3 linker domains constitute the interaction interface with LSD1
Next, we expanded our analysis to all the residues that comprise the tower and linker domains (i.e., not only key residues for interaction). We investigated whether there is a relationship between the conservation score of a particular residue and its position in the 3D structure of the complex. Again, these analyses were restricted to RCOR1 and RCOR3, given their known 3D structure in complex with LSD1.
In the case of LSD1, 39.6% of the residues of the tower domain were classified as highly conserved (Fig. 6A, dark red). Of this group, 54.7% of the amino acids directly participate in the interaction interface with RCOR1. These residues are distributed along the four segments analyzed in supplementary figure 3. The other group, i.e., the highly conserved amino acids that do not participate in the interaction interface, are located 13 mostly on the basal/middle section of the Tα2 helix and the distal segment of the Tα1 helix (Fig. 6A). The functional significance of these amino acids remains to be studied.
As for RCOR1, 27.4% of residues in its linker domain displayed highly conserved scores (Fig. 6B). Remarkably, of this group of residues, 76.5% participate in the interaction interface with LSD1, and the majority are located in the Lα2 helix. Interestingly, three highly conserved residues in the Lα2 helix, Arg347, Gln350, and Gln354 (Fig. 6B, light blue arrows) are oriented towards the opposite direction of LSD1. The function of these amino acids is currently unknown. Finally, in the case of RCOR3, 26.6% of the residues in its linker domain are highly conserved, and 62.5% of them face LSD1 (Fig.   6C). Furthermore, for RCOR3, highly conserved amino acids are not solely enriched in the Lα2 helix, but are more evenly distributed between the Lα1 and Lα2 helices. In addition, and as seen with RCOR1, three Lα2 residues, Gln315, Asn316, and Gln319, oriented themselves on the opposite side of the Lα2 helix, away from the interaction interface (Fig. 6C, light blue arrows).
Thus, this analysis sheds light on several conserved "blocks" that we hypothesize fulfill relevant molecular functions in each protein.
As an example, Lys447 (Fig. 6A, red arrow) at LSD1s` tower domain, a highly conserved residue that does not participate in the interaction interface, has been previously shown to inform the structural relationships between LSD1 and HDAC1 within the RCOR1 ternary core complex (Song et al. 2020).
Also, the most conserved amino acids in RCOR1 and RCOR3 linker domains were predictive of LSD1 interaction capabilities. Most interesting are the residues that proved highly conserved but whose molecular function has not yet been explored. Specifically, the outward-facing arginine, glutamine, and asparagine of RCOR1 and RCOR3 hold promise for further analyses.

RCOR and LSD1 interaction precedes the RCOR repertoire expansion in the ancestor of jawed vertebrates
Given the evolutionary pattern of repertoire expansion of RCOR genes in the jawed vertebrate ancestor and the lack of a systematic assessment of RCOR-LSD1 interaction in non-vertebrate species, we wondered if the single RCOR protein present in the jawed vertebrate ancestor, before the RCOR repertoire expansion, was able to interact with the 14 corresponding LSD1 ancestral protein. To investigate this matter, we manually curated RCOR1, RCOR2, RCOR3, and LSD1 sequences to reconstruct the RCOR and LSD1 proteins present in the ancestor of jawed vertebrates. We then synthesized plasmids harboring RCOR and LSD1 attached to Myc and HA epitopes, respectively. We used these constructs for co-immunoprecipitation (Co-IP) assays on HEK293 cell lysates, with human RCOR1 and LSD1 sequences as controls (Fig. 7).
As expected, RCOR1 and LSD1 human proteins co-precipitated when we used anti-Myc antibodies for precipitation ( Fig. 7C and S3). Importantly, we also found that the ancestral RCOR protein formed an immunocomplex with the ancestral LSD1 protein (Fig.   7C), suggesting that both proteins already formed a complex in the jawed vertebrate ancestor, before the RCOR repertoire expansion, between 615 and 473 million years ago (Kumar et al. 2017). This result is in agreement with previous findings of an LSD1-RCOR protein complex in the fruit fly (Drosophila melanogaster) (Mačinković et al. 2019).
Importantly, LSD1-RCOR interaction has only been demonstrated for human, rodent, and fruit fly proteins (Hakimi et al. 2003;Dallman et al. 2004;Shi et al. 2005;Yang et al. 2006; A.P. . The extent to which this interaction is also present in other animals or other non-animal species (e.g., fungi, plants) is currently unknown. This is partly due to the fact that the evolution of the RCOR gene family has remained unstudied to date.

RCOR and LSD1 interaction is not restricted to animals
As mentioned, LSD1-RCOR complex formation depends on linker-tower domain interaction. The tower domain is an interruption of LSD1`s AOD domain by a ~90 amino acid protrusion composed of two intertwined alpha-helices. Tower domain presence/absence can be predicted by the size of the interrupting segment and its probability of formation of a coiled-coil structure (Lupas et al. 1991;Zhou and Ma 2008).
Previous studies have shown that animal LSD1 shows a high probability of coiled-coil formation, suggesting that LSD1-RCOR interaction is widespread in metazoans (Lupas et al. 1991;Zhou and Ma 2008). Interestingly, Ma 2008(Zhou andMa 2008), described that although fungal LSDs` catalytic domain is interrupted, as in the case of animals, no coiled-coil structure is predicted. According to our search, however, the fly 15 pathogenic fungus (Entomophthora muscae) possesses an LSD gene (GENC01027297.1) with a similar sized interruption and a predicted coiled-coil structure in its catalytic domain (Fig. 8A, B) (Lupas et al. 1991;Zhou and Ma 2008). To our surprise, this species also harbors an RCOR gene in its genome (Fig. 8A). This is in contrast to model organisms like the budding yeast (Saccharomyces cerevisiae) and the fission yeast (Schizosaccharomyces pombe). The budding yeast lacks an LSD gene, and the fission yeast has two LSD genes with no coiled-coil detected in their AOD domains and no RCOR identified in their genomes (Fig 8A) (Shi et al. 2004). Thus, RCOR proteins are not restricted to animals, and by virtue of their interacting domains, we hypothesize that LSD1-RCOR interaction is present at least in the fly pathogenic fungus (Entomophthora muscae).
To investigate if these interactions could be predicted in organisms that diverged previous to the animal/fungi ancestor, we turned our attention to plants. We chose plants These findings suggest that LSD1-RCOR interaction could also be present in some species of plants. To investigate this, we used the sequences of the RCOR and LSD1 proteins of the common hop for additional Co-IP experiments. We chose the common hop because its LSD1 sequence is strongly predictive of a tower domain, as like the human LSD1, it has two consecutive coiled-coil motifs in its catalytic domain (Fig. 8A).

Competing interests
The authors declare no competing interests.

Figure 4:
Phyletic distributions of RCOR and LSD genes, LSD1 microexon, and neuronspecific splicing regulatory sequences in vertebrates. A) Distribution of the RCOR and LSD genes, LSD1 microexon, and neuron-specific splicing regulatory sequences in main groups of vertebrates. B) Dot-plot of pairwise sequence similarity between the RCOR2 gene of the painted turtle (Chrysemys picta) and the corresponding syntenic region in the chicken (Gallus gallus) and New Caledonian crow (Corvus moneduloides). a RCOR2 is present only in the bird orders Psittaciformes, Passeriformes, Accipitriformes, and Anseriformes. b Lampreys have a single copy of the RCOR gene. c Some species of birds have a -DTVE-microexon.    . B) Multiple sequence alignment of human, Entomophthora muscae, common hop LSD1 and human LSD2 in the tower domain region of the human LSD1 sequence. C) Sodium dodecyl sulphatepolyacrylamide gel electrophoresis (SDS-PAGE) followed by immunoblot detection of the immunoprecipitate obtained using an α-Myc-RCOR antibody. Input represents whole-cell protein extracts, IgG serves as a negative control for the immunoprecipitation, while IP anti-Myc is the immunoprecipitation of common hop RCOR protein. Molecular weights are indicated at right. a LSD fission yeast homologs according to (Nicolas et al. 2006), b LSD thale cress homologs according to (Spedaletti et al. 2008).

Protein sequence collection and phylogenetic analyses
We obtained lysine-specific demethylase 1 (LSD1), lysine-specific demethylase 2 (LSD2), REST Corepressor 1 (RCOR1), REST Corepressor 2 (RCOR2), and REST Corepressor 3 (RCOR3) protein sequences from the Ensembl v.102 (Yates et al. 2020) using orthology and paralogy estimates from the EnsemblCompara database (Herrero et al. 2016); these estimates were obtained from an automated pipeline that considers both synteny and phylogeny to generate orthology mappings. Further, we also retrieved sequences from the National Center for Biotechnology Information (NCBI) (Sharma et al. 2018) using the human (Homo sapiens) sequence as the reference for protein-BLAST (blastp) (Altschul et al. 1990) against the non-redundant database (nr) with default parameters. In each case, we corroborated the presence of all described protein domains (SWIRM/AOD for LSD, ELM2/SANT1/SANT2 for RCOR).
We implemented two types of analyses that involved different sampling strategies. The first analysis was aimed to understand the evolutionary history of these groups of genes in vertebrates, and our taxonomic sampling included representative species of all main groups of vertebrates. The second analysis was aimed to investigate the evolution of these genes in metazoans, thus our sampling included species of all main groups of animals. Accession numbers and details about the taxonomic sampling are available in Supplementary Table S1. We used the software MAFFT v.7 (Katoh and Standley 2013) to align amino acid sequences allowing the program to choose the alignment strategies (L-INS-i for LSD alignment in vertebrates and metazoa; FFT-NS-i for RCOR in vertebrates and metazoa). We estimate phylogenetic relationships using maximum likelihood (ML) approach as implemented in IQ-Tree v1.6.12 (Nguyen et al. 2015). We used the proposed model tool of IQ-Tree v.1.6.12 (Kalyaanamoorthy et al. 2017) to select the best-fitting models of amino acid substitution, which selected JTT+I+G4 for LSD in vertebrates and RCOR in vertebrates and metazoans. For LSD in metazoans, the model selected was LG+F+R10. We assessed the node support using Shimodaira-Hasegawa approximate likelihood-ratio test (Guindon 2010), approximate Bayes test (Anisimova and Gascuel 2006;Guindon 2010), and ultrafast bootstrap approximation with 1000 pseudoreplicates (Minh et al. 2013;Hoang et al. 2018). We repeated each phylogenetic estimation ten times to explore the tree space, and the tree with the highest likelihood score was chosen. Sequences of monoamine-oxidases A and B (MAO-A and MAO-B), and Mitotic deacetylase-associated SANT domain protein (MIDEAS) were used as outgroup for LSD1/2 and RCOR1/2/3, respectively.

Dot-plots
We retrieved the chromosomal region containing the RCOR2 gene of the painted turtle (NW_007359864.1) and the corresponding syntenic region in the chicken (Chromosome 33-NC_008465.4), and New Caledonian crow (Chromosome 34-NC_045509.1) based on the location of the flanking genes (NAA40 and MARK2). We aligned RCOR2 syntenic regions using PipMaker (Schwartz et al. 2000).

Domain conservation analysis
We performed four amino acid alignments corresponding to LSD1, RCOR1, RCOR2, and RCOR3, including representative species of all main groups of jawed vertebrates. Alignments were conducted using MAFFT v.7 (Katoh and Standley 2013), allowing the program to choose the alignment strategies (L-INS-i, in all cases). Then, using the alignments as input, we estimated normalized conservation scores for each alignment independently using the ConSurf WebServer (Landau et al. 2005;Ashkenazy et al. 2010;Ashkenazy et al. 2016). Protein domain positions were inferred using the human proteins as input using the InterPro web server (Blum et al. 2021).

Fungal and plant RCOR and LSD1 sequences
Human RCOR1 and LSD1 protein sequences were used as reference for blastp searches using the non-redundant protein sequences (nr) database, or translated nucleotide-BLAST (tblastn) (Gertz et al. 2006) using either nucleotide collection (nr/nt) or transcriptome shotgun assembly (TSA) databases. Criteria to classify a product as RCOR were: e-value<10 -40 and presence of the four classical domains, ELM2, SANT1, linker, SANT2, in that order. Criteria to classify a protein as LSD1 were: e-value<10 -40 , presence of the three classical domains, SWIRM, AOD-N, tower, AOD-C in that order. Tower prediction was based on the size of the interruption of the AOD domain (~90 residues) and a strong (<0.8) prediction of coiled-coil structure formation in the center of the AOD domain. Coiled-coils were predicted using the webserver COILS (Lupas et al. 1991;Zhou and Ma 2008). RCOR and LSD1 functional domains were predicted using the Interpro web server (Blum et al. 2021).

Co-immunoprecipitation assays
The coding sequences for the jawed vertebrate ancestor LSD1 (from the start of the SWIRM domain until the end of the AOD domain), human LSD1 (same as above), and common hop LSD1 (same as above), and for the jawed vertebrate ancestor RCOR (from the start of the EML2 domain until the end of the SANT2 domain), human RCOR1 (same as above) and common hop RCOR (same as above), along with nuclear localization signals were obtained by synthetic synthesis (Twist Bioscience Corporation). Then, LSD1 sequences were cloned into the pCGN plasmid in frame with the HA epitope. RCOR sequences were cloned into the pCS2 + MT plasmid in frame with Myc epitopes. Frames were confirmed by sequencing and by western blot. Plasmid pairs were co-transfected in equimolar amounts into HEK293T cells. Twenty-four hours after transfection, whole RIPA extracts were immunoprecipitated with α-Myc antibodies as we have described previously (Hakimi et al. 2003;Dallman et al. 2004;Shi et al. 2005;Yang et al. 2006; A.P.   Figure 4: Phyletic distributions of RCOR and LSD genes, LSD1 microexon, and neuronspecific splicing regulatory sequences in vertebrates. A) Distribution of the RCOR and LSD genes, LSD1 microexon, and neuron-specific splicing regulatory sequences in main groups of vertebrates. B) Dot-plot of pairwise sequence similarity between the RCOR2 gene of the painted turtle (Chrysemys picta) and the corresponding syntenic region in the chicken (Gallus gallus) and New Caledonian crow (Corvus moneduloides). a RCOR2 is present only in the bird orders Psittaciformes, Passeriformes, Accipitriformes, and Anseriformes. b Lampreys have a single copy of the RCOR gene. c Some species of birds have a -DTVE-microexon.  Multiple sequence alignments of the jawed vertebrate ancestral RCOR, human RCOR1, RCOR2, and RCOR3 linker and SANT2 domains. C) Sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) followed by immunoblot detection of the immunoprecipitate obtained using an α-Myc antibody. Input represents whole-cell protein extracts, IgG serves as a negative control for the immunoprecipitation, while IP anti-Myc is the immunoprecipitation of ancestral RCOR protein. Molecular weights are indicated at right.