Differential Expression of Members of SOX Family of Transcription Factors in Failing Human Hearts.

The Sry-related high-mobility-group box (SOX) gene family, with 20 known transcription factors in humans, plays an essential role during development and disease processes. Several SOX proteins (SOX4, 11, and 9) are required for normal heart morphogenesis. SOX9 was shown to contribute to cardiac fibrosis. However, differential expression of other SOXs and their roles in the failing human myocardium have not been explored. Here we used the whole-transcriptome sequencing (RNA-seq), gene co-expression, and meta-analysis to examine whether any SOX factors might play a role in the failing human myocardium. RNA-seq analysis was performed for cardiac tissue samples from heart failure (HF) patients due to dilated cardiomyopathy (DCM), or hypertrophic cardiomyopathy (HCM) and healthy donors (NF). The RNA levels of 20 SOX genes from RNA-seq data were extracted and compared to the three groups. Four SOX genes whose RNA levels were significantly upregulated in DCM or HCM compared to NF. However, only SOX4 and SOX8 proteins were markedly increased in the HF groups. A moderate to strong correlation was observed between the RNA level of SOX4/8 and fibrotic genes among each individual. Gene co-expression network analysis identified genes associated and respond similarly to perturbations with SOX4 in cardiac tissues. Using a meta-analysis combining epigenetics and genome-wide association data, we reported several genomic variants associated with HF phenotype linked to SOX4 or SOX8. In summary, our results implicate that SOX4 and SOX8 have a role in cardiomyopathy, leading to HF in humans. The molecular mechanism associated with them in HF warrants further investigation. BACKGROUND: : Heart failure (HF) affects 5.7 million people in the U.S. with more than 500,000 new cases added annually. New therapeutic strategies remain needed for HF. The cardiac-pathological features are driven by transcriptome reprogramming. Understanding transcriptional control of HF is beneficial for developing novel therapies. TRANSLATIONAL SIGNIFICANCE: : This study revealed that SOX4 and SOX8 were significantly increased in cardiac tissues of HF patients and the HF-genetic variants and epigenetic changes associated with these two genes. Understanding the role of SOX4 and SOX8 in failing human myocardium and gene regulatory networks related to them may help to identify potential therapeutic targets for HF.


Background
Heart failure (HF) is a pandemic condition affecting at least 26 million people worldwide [1]. When a heart is failing, it may undergo cardiac remodeling with changes in cardiac structures and functions, leading to either dilated cardiomyopathy (DCM) or hypertrophic cardiomyopathy (HCM). These pathological features are largely driven by transcriptome reprogramming in response to pathophysiological stimulations such as oxidative stress and in ammation [2,3], which are primarily regulated by transcription factors (TFs). Increasing the knowledge of the transcriptional control mechanism, together with ensuring wider drug screening and repurposing capabilities, can be bene cial for developing novel therapies for HF.
The SRY-related high-mobility-group box (SOX) gene family, with 20 known TFs in humans, plays essential roles during embryonic development and cell fate determination as well as in many disease processes and functions, such as immunity and in ammation [4]. It is known that SOX4 and SOX11 (members of SOXC group) are critical for cardiac out ow tract formation, whereas SOX9 (a member of the SOXE group) is expressed in the cardiac cushion mesenchyme and is required for heart valve development [5]. Additionally, SOX6, a member of the SOXD group, has been shown to regulate cardiac myocyte development [6]. Genome-wide gene expression analysis of the infarcted mouse heart has revealed that SOX9 is a potential transcriptional regulator of the genes that mediate cardiac brosis [7]. A recent study demonstrated that SOX9 regulates myocardial brosis during ischemic injury in animal models [8]. SOX17 has been identi ed as a candidate risk gene for atrial hypertension with congenital heart disease [9,10]. However, differential post-natal expressions of SOX TFs and their functional roles in the failing human myocardium have not been explored. Various disorders can cause HF. Herein, we analyzed the whole-transcriptome data generated from Myocardial Applied Genomics Network (MAGNet) using left ventricle tissues of the hearts from DCM and HCM, and from non-failing individuals (NF), to evaluate the expression of SOX genes and validate them in the human myocardium. Herein, we analyzed the whole-transcriptome data generated from Myocardial Applied Genomics Network (MAGNet) using left ventricle tissues of the hearts from DCM and HCM, and from non-failing individuals (NF), to evaluate the expression of SOX genes and validate them in the human myocardium.

RNA-seq analysis.
RNA-seq data were extracted from data generated from the Myocardial Applied Genomics Network (MAGNet) (GSE141910) [11]. Pair-end reads were mapped to the human hg 19 reference genome and then for transcript assembling and quanti cation using STAR package (v2.5.2b) with default parameters [12]. Reads uniquely mapped were considered for further analysis. DEseq2 (v3.8), a program using countbased matrices to identify differentially expressed genes, was used to determine gene signatures [13].
Only genes with raw counts > 5 in all individually sequenced samples were examined in order to meet the reliability and accuracy of differential expression analysis. One-Way ANOVA and Benjamini-Hochberg false discovery rate tests were performed for each of the differentially expressed genes across the groups. The count values were normalized to Transcripts Per Kilobase Million (TPM). Changes > 1.5-fold with an adjusted p < 0.05 were considered differentially expressed between groups. The online NetworkAnalyst [14] was used for co-expressed gene network analysis.

Western blot.
Protein extracts were prepared from human left ventricles (LV) from the same cohorts of subjects in MAGNet as previously described [15]. Brie y, 60 mg of LV were homogenized using a tissue homogenizer with a 5 × 75 mm Flat Bottom Stainless Steel generator probe (OMNI international, THP115) in a RIPA buffer (ThermoFisher, 89900) containing protease and phosphatase inhibitors (ThermoFisher, 78410 & 78420). Thirty µg total proteins were separated in SDS-polyacrylamide gel electrophoresis, transfer to nitrocellulose membranes or Polyvinylidene uoride or polyvinylidene di uoride (PVDF), and blotted with primary antibodies overnight at 4 °C. The total lane density of transferred proteins stained with Ponceau S was used to control for loading/transfer differences. Primary antibodies used were as follows: SOX4 (Diagenode, CS129100), SOX8 (GeneTex, GTX129949) SOX9 (Millipore, AB5535) SOX15 (ThermoFisher, 25415-1-AP). Secondary antibodies coupled to Alexa Fluor 680 (Invitrogen Molecular Probes) or IRDye 800 (LI-COR Biosciences) were used, and the Odyssey CLS infrared imager system (LI-COR Biosciences) was used for visualization of Western blot signals. Odyssey version 1.2 imaging software was used to process all images.

Meta-analysis.
We analyzed the gene loci of interest using the Roadmap epigenomics ChromHMM 25-state model across all cardiac-tissue types and H3K27ac, H3K4me1 and H3K4me3 ChIPseq peak data in the left ventricle, vertebrate phastCons evolutionary conservation values [16] and genome-wide association study (GWAS) data from from the cardiovascular disease knowledge portal (CVDKP) [17,18]. Only the genomic variants associated with the HF phenotype with a p-value < 0.05 and located within ± 50 kb of the gene of interests were analyzed.

Results
We carried out RNA-seq approach to investigate the whole-transcriptome pro les for left ventricle (LV) tissues from patients suffering from HF due to DCM (n = 162) and HCM (n = 28) as previously described [15]. The transcriptome pro les of the SOX genes were extracted for comparisons. Among 20 SOX genes, eight were expressed in at least one of the three groups (cut-off > 1 TPM) ( Table 1). The expressed SOX genes shared between the NF, DCM, and HCM groups are SOX4, SOX9, SOX7, SOX12, SOX17, and SOX18. Interestingly, these expressed SOX genes fall into three subgroups of SOX TF. They are SOXC, SOXE, and SOXF. All the SOXF group genes, including SOX7, SOX17, and SOX18, were present in NF, DCM, and HCM. Two of three SOXC genes, SOX4 and SOX12, were expressed in NF and HF. One SOXE gene, SOX9, was detected in all three groups. Two genes were only found to be expressed in failing human LVs but not in normal ones. They are SOX8, which belongs to the E group and SOX15, a G group SOX gene. Some of these expressed SOX genes displayed differential expression between the non-failing and failing groups (fold changes ≥ 1.5). These included SOX4 (SOXC), SOX8 and SOX9 (SOXE) and SOX15 (SOXG). SOX4 RNA level was upregulated by 1.6-and 2.1-fold in DCM and HCM, respectively, in comparison to NF, whereas the SOX8 RNA level was signi cantly increased in DCM by 2.6-fold and in HCM by 2.8-fold in comparison to the NF LVs (Fig. 1A & D and Table 1). To a lesser extent, SOX9, another E group gene, was found to be increased by more than 1.5-fold in HCM but not in DCM (fold change 1.14; Fig. 1G and Table 1). Additionally, SOX15, a solo member of SOXG, was increased by 1.9-fold in both DCM and HCM compared to NF ( Fig. 1J and Table 1). In contrast, other members of the SOX genes found to be expressed in the heart (SOX12, SOX7, SOX17, and SOX18) showed no signi cant differences between the NF and HF groups (Table 1). We validated the protein levels of the differentially expressed SOX genes found within the three groups using Western blot. The SOX4 protein level was signi cantly elevated by 2.27-fold and 3.67-fold in DCM and HCM, respectively ( Fig. 1B-C). The SOX8 protein in the NF group was barely detected, which was consistent with its RNA level (TPM < 1), whereas the SOX8 protein level increased 2.83-fold in DCM and increased (rather robustly) 6.78-fold in HCM (Fig. 1E-F). However, the SOX9 and SOX15 protein levels were not signi cantly different among the three groups ( Fig. 1H-I & K-L).
The gene network analysis of genes expressed in HCM or DCM using co-expressed gene network analysis showed several genes associated with SOX4 in the left ventricle of hearts. These genes included collagen-related genes, COL6A1 and COL6A2, a putative calcium-binding protein, reticulocalbin-3 (RCN3), an enzyme involved in cell migration proliferation and the epithelial-to-mesenchymal transition, dihydropyrimidinase-related protein 3 (DPYSL3) [19], a secreted sulfated glycoprotein, C-type lectin domain containing 11A (CLEC11A), an actin-binding protein, retinoic acid-induced protein 14 (RAI14), and an ETS transcription factor ELK3 (ELK3) (Fig. 2A). However, we did not observe any signi cant genes linked to SOX8. Among the SOX4-associated genes, CLEC11A, COL6A2, DPYSL3, and RCN3 were upregulated signi cantly at least 1.5-fold in both DCM and HCM, whereas COL6A1 and ELK3 were only found to have increased considerably in DCM or HCM (Fig. 2B).
We subsequently analyzed chromatin state data and integrated them with the GWAS signals for both SOX4 and SOX8 loci from public data resources. We included all the GWAS signals and variants associated with the HF phenotype from the cardiovascular disease knowledge portal (CVDKP) [17,18] and compared them with the Roadmap epigenomics data [16]. Roadmap epigenomics revealed several active enhancers along with transcriptional activity at both gene loci in fetal heart, left and right ventricular, and atrial tissues (Fig. 3). Several genomic variants with a p-value of < 0.05 were found within ± 20 kilobases (kb) of the SOX4 and SOX8 loci and overlapped with active cis-regulatory element signatures in the fetal heart and left and right ventricular tissues. At the SOX4 loci, one variant, rs192898967, located upstream 5.8 kb, overlapped with the DNase hypersensitivity signal, H3K27ac, the H3K4me1 enhancer signatures, and the H3K4me3 promoter mark, indicating the variant is associated with cis-regulatory elements, which may interact with a trans-acting element to regulate the transcription of SOX4. Additionally, variants 2-4 and 5-9 were located at 3' UTR and downstream of the gene, respectively (Fig. 3A). The frequency of the variants 5-9 was more common than others. At the SOX8 loci, a board variety of cis-regulatory elements was present. However, its neighbor gene, LMF1 (Lipase Maturation Factor 1), is only 10 kb away from the transcription start site of SOX8. Therefore, both genes may share the same cis-regulatory elements. Nevertheless, we observed two variants, rs12448761 and rs552159903, which were located 7 kb and 12 kb upstream of SOX8 that also belonged to the intron of LMF1. These regions overlapped with the active enhancer marks as well as the DNase hypersensitivity peaks, suggesting they may be involved in controlling the expression of both genes (Fig. 3B). There are eight additional variants found in the SOX8 coding regions, at 3' UTR, and in the downstream regions. Even though these eight variants are not linked with active cis-regulatory elements, their occurrence was more frequent than that of the other two variants located upstream of SOX8; therefore, they may be more likely to contribute to the effect.

Discussion
To our knowledge, the present study is the rst to systematically analyze SOX family protein myocardial tissue levels in different types of cardiomyopathies leading to HF in humans. We identi ed four differential expressed SOX genes between non-failing and failing human hearts. The protein level of two of them, namely SOX4 and SOX8, were elevated in the left ventricle tissues of hearts from patients who had DCM and HCM compared to the NF controls.
We had previously reported that SOX4 protein and RNA levels were signi cantly elevated in DCM tissues [15]. Here, we further analyzed its protein and RNA levels in HCM tissues to con rm these ndings (Fig. 1). Our data suggest the elevation of SOX4 is associated with a shared common pathway present in the development and progression of HF. However, the expression levels of SOX4 in other cardiovascular conditions causing HF remains to be determined. Using a co-expressed gene-associated network analysis, we found that CLEC11A, COL6A2, DPYSL3, and RCN3 were positively associated with SOX4 (Fig. 2). Interestingly, CLEC11A, COL6A2, RCN3, and SOX4 are all involved in the brosis of various tissues [20][21][22][23]. Therefore, it would be intriguing to investigate whether they may contribute to the development of cardiac brosis. Further, DPYSL3 has been shown to regulate cell mitosis, migration, and epithelial-mesenchymal transition processes in breast cancer [19]. These cellular events are all associated with the pathological processes of HF. It will be interesting to further explore the role of these processes and their relationship with SOX4 in HF. ELK3 was found to be signi cantly upregulated in HCM but not in DCM. The gene was known for its role in the inhibition of iNOS [24]. Interestingly, it was shown that the mRNA of iNOS can be detected in endomyocardial tissues from patients with DCM but not HCM [25]. The upregulation of ELK3 in HCM suggests the involvement of this TF in suppressing the expression of iNOS in HCM. Nevertheless, the role of ELK3 in HF has not been reported.
Although we did not nd any genes associated with SOX8 in the human LV, this does not exclude the involvement of SOX8 in regulating cardiac function, especially in earlier pathogenic processes. This is because most SOX8 studies have been focused on its role in reproduction, neural development, and cancers [26][27][28]. SOX8 belongs to the same SOXE group as SOX9, which plays a key role in regulating cardiac brosis in HF secondary to ischemic heart disease [7]. Because a functional redundancy occurs in the same family of SOX factors, SOX8 may elicit its function similarly to SOX9 in cardiac brosis in HF. Interestingly, we did not detect a signi cant change of the SOX9 protein in NF, DCM, and HCM despite the fact that its RNA level is increased in DCM and HCM (Fig. 2G-I). It has been reported that SOX9-expressing cells were found near the infarcted area in human ischemic hearts [7]. Therefore, SOX9-positive cells may only be found regionally close to the brotic regions of LVs in DCM and HCM. Furthermore, unlike SOX9, both SOX8 RNA and protein levels were signi cantly elevated in DCM and HCM (Fig. 2), suggesting SOX8 operates differently in regulating the pathological processes of the heart. It will be interesting to explore this possibility further.
Using a meta-analysis combining Roadmap Epigenomics and GWAS data from CVDKP for SOX4 and SOX8 loci, we reported several genomic variants associated with HF. Some of these variants are within the DNA regions related to active cis-regulatory elements, such as enhancers, suggesting their potential involvement in gene regulation. However, further functional validation of these variants will be necessary to determine whether they contribute to the gene expression level of SOX4 and SOX8 and whether they are risk alleles for HF.
Gene expression level is precisely controlled by TFs and the transcriptional machinery associated with them. Thus, targeting TFs to manipulate gene activation or inhibition, speci cally in the desired conditions, is an innovative approach for therapeutic purposes. As TFs, SOX proteins elicit their transcriptional functions not only by recognizing and binding to a speci c DNA sequence motif but also by recruiting their speci c TF working partners, such as SOX9 and steroidogenic factor 1 during gonadal development; or by dimerizing themselves, such as by SOX9 dimerization in chondrogenesis. Therefore, it is logical to target SOX TFs by interrupting their protein-protein interactions or by interfering their binding to the speci c DNA sequencing [29]. However, at present, researchers are only able to design an e cient method to target one SOX factor, SOX18, to mitigate the progression of cancer [30]. This is mainly because (1) the same family of SOX proteins usually operates in a mutually redundant manner, (2) SOX proteins often have multiple protein partners, (3) the delivery of drug to the nucleus for targeting TFs is challenging. Therefore, understanding the protein structure, the organization of the protein-protein interaction, and the protein-DNA interface for each SOX protein at the molecular level will allow logical designs of treatment to be effective, especially aiming at these features. Additionally, the protein-binding partners of SOX4 and SOX8 remain unknown. Thus, it will be essential to study the protein-protein interactions for both SOX proteins in the process of HF in the future.
The posttranslational regulation of the SOX protein has been shown to play a vital role in regulating SOX protein activity and stability. Acetylation of the SOX protein at speci c residues can promote transcriptional activity, whereas phosphorylation or SUMOylation can cause the degradation of SOX proteins [31,32]. For example, it was shown that the acetylation of SOX4 at the lysine 95 position resulted in chromatin remodeling during myoblast differentiation [33]. Targeting the enzymes responsible for these posttranslational modi cations (PTM) changes will be another promising approach for treating HF drug discovery. Understanding the PTM's machinery for SOX proteins in hearts, particularly for SOX4 and SOX8 as well as for their protein families SOXC and E, will facilitate drug discovery for HF.

Conclusions
In summary, we identi ed two differential expressed SOX TFs, SOX4 and SOX8, and their associated genes in human left ventricle tissues between non-failing and failing hearts. Several genomic variants linked with SOX4 and SOX8 were reported in this study, and further elucidation of their functional roles and underlying molecular mechanisms is warranted.

Declarations
Ethics approval and consent to participate The study was approved by the Cleveland Clinic Institutional Review Board and informed consents were obtained from all donors and patients.

Consent for publication
Not applicable Availability of data and materials The MAGNet RNA-seq data set can be found in the GEO database (GSE141910 ). The rest of the datasets used and/or analyzed during the current study are available on reasonable request from the corresponding author.