The AGT Epistasis Pattern Proposed a Novel Role for ZBED9 In Regulating Blood Pressure: Tehran Cardiometabolic Genetic Study (TCGS)

Backgroung: Hypertension is typically considered as the leading risk factor for cardiovascular disease. Epistasis studies may add another layer of complexity to our understanding of the genetic basis of hypertension. Methods: A nested case-control design was used on 4214 unrelated Tehran Cardiometabolic Genetic Study (TCGS) adults to evaluate 65 SNPs of previously associated genes, including ZBED9, AGT, and TNXB. The integrated effect of each gene was determined using the Sequence-based Kernel Association Test (SKAT). We used model-based multifactor dimension reduction (Mb-MDR) and entropy-based gene-gene interaction (IGENT) methods to determine interaction and epistasis patterns. Results: The integrated effect of each gene has a statistically signicant association with blood pressure traits (P-value < 0.05). Single-locus analysis identied two missense variants in ZBED9 (rs450630) and AGT (rs4762) that are associated with hypertension. In the ZBED9 gene, signicant local interactions were discovered. The G allele in rs450630 showed an antagonistic effect on hypertension, but interestingly, IGENT analysis revealed signicant epistasis effects for different combinations of ZBED9, AGT, and TNXB loci. Conclusion: We discovered a novel interaction effect between a signicant variant in an essential gene for hypertension (AGT) and a missense variant in ZBED9, which has shifted our focus to ZBED9's role in blood pressure regulation. and AGT and TNXB in linkage analysis with high blood pressure events in the Iranian population. Several studies have reported that interaction may unravel the main effects of genetic variants that are not associated in the single-locus analysis (25,26). Further, epistasis effects may explain genetic contribution on high blood pressure somewhat and disclose the mystery behind missing heritability (27). Our results contribute to previous research on hypertension, where the protective effect of rs450630 (G>A) was reported in patients with hypertension. Results on independent effects of ZBED9: rs450630 showed that this variant exhibits a strong association with hypertension, and the results of epistasis and local interaction exacerbate this association, in which all the interaction combinations, whether with a homozygous or heterozygous genotype of other variants, were determined as antagonistic if ZBED9: rs450630 has heterozygous genotype. Hence, it appears that ZBED9: rs450630 has a deterministic role in the interaction patterns of the genetic variants on hypertension.


Introduction
Hypertension (HTN), as a silent killer, is one of the leading causes of death due to cardiovascular diseases (1).Globally, a number of 1.38 billion people with high blood pressure (systolic blood pressure (SBP) ≥140 mmHg and/or diastolic blood pressure (DBP)≥90 mmHg had been estimated in adults age over 18 years worldwide in 2010 (Mills et al. 2016; Mills, Stefanescu, and He 2020).However, the incidence and prevalence of HTN have been distributed disproportionally with a high burden of disease in low-to middle-income countries (4).
Epidemiological ndings had stressed the fact that high blood pressure is primarily due to ageing, physical inactivity and unhealthy diets including, high sodium and low potassium intake, in the presence of a genetic predisposition (Mills et al. 2016;Mills, Stefanescu, and He 2020).As a complex disease, at least 30% of SBP and DBP variances are attributed to genetic factors in family-based heritability analysis (5).Further, genome-wide association studies (GWAS) have identi ed a large number of loci that are associated with blood pressure traits in diverse populations, of which the majority of genetic variants have been reported in individuals with European Ancestry (6).
Iran, as a middle-income country, is forced to halt the increasing rate of hypertension.According to national reports, almost half of Iranian adults live with high blood pressure (7,8).Similar to GWAS ndings in the European population, AGT and TNXB with a consistent effect on SBP, DBP and HTN are contributed to blood pressure regulation among Iranian families (9).Further, a recent genome-wide association study in the Iranian population discovered a new locus (ZBED9) for hypertension that was con rmed in a linkage analysis on different sets of blood pressure traits and replicated in summary statistics of UK Biobank on DBP and HTN (10).
Accordingly, the current study aimed to approve epistasis effects of AGT, ZBED9, and TNXB on blood pressure traits using different statistical approaches, including non-parametric machine-learning algorithm, Model-based Multifactor Dimensionality Reduction (Mb-MDR), and entropy-based MDR interaction network.

Study participants and study design
Tehran Cardiometabolic Genetic Study (TCGS) is a family-based genetic database according to a larger strudy, Tehran Lipid and Glucose Study (TLGS) (11)(12)(13).TLGS is a population-based cohort started in 1999, and study subjects have been followed up through six phases (every three years) on non-communicable disorders' (NCDs) and related risk factors (14).The current study extracted 4,214 unrelated participants aged 18 or above, based on the well-designed protocol in previous GWAS in TCGS (9,10).HTN incident cases and a random sample of healthy individuals with two or more follow-up records were included in the study using a nested case-control sampling design.

Measurements
Principles for clinical investigations could be found in the original TCGS and TLGS papers (19,22).Having SBP ≥140 mmHg or DBP ≥90 mmHg or taking antihypertensive treatment was hypertension de nition.Risk factors of high blood pressure traits (SBP, DBP, and HTN) were chosen based on previous GWAS literature(6,9,10,15).The following predictors were included in the model as covariates: sex, age, and body mass index (BMI).BMI was measured as the ratio of body weight to squared height (kg/m 2 ).Using the Amelia package in R, height and weight were imputed multiple times using the Expectation-Maximization method with Bootstrapping (EMB) approach, for which the missing rate of covariate's values ranged from 0.2-7.7% over ve phases of TLGS.Traits and their risk factors were tested for their distribution among males and females, using the T-test and Chi-square test.

Genotyping and quality control
Blood samples of TCGS participants were genotyped using humanOmniExpress-24-v1 bead chips on 652919 single nucleotide polymorphisms with an average mean distance of 4 kilobases for each individual at deCODE genetic company (16).Genomic quality control (QC) was implemented in comprehensive steps (10,17).

Testing the integrated effect of gene
Sequence Kernel Association Test (SKAT) was used to test the overall effect of AGT, ZBED9, and TNXB.Optimal SNP-set kernel-based association test (SKAT-O) is used to test the association between a set of variants in a region in the genome, either rare or common or both cumulative effects, and a trait (binary or quantitative) by a linear or logistic regression framework(18).SKAT-O obtains an aggregated score of individual variants and e ciently obtains a corresponding SNP-set P-value.Moreover, the covariates are adjusted in SKAT-O.In the case of the linear kernel, test statistic can be expressed as a weighted sum of the single-marker score test statistic.SKAT-O is usually considered as a variance component test that assumes the genotyping effects are random.P-value is provided for each set of variants of ZBED9, AGT and TNXB, separately.All analyses are done using the SKAT R package, version 2.0.1 (19).Readers can nd the work ow of gene-gene interactions between AGT, ZBED9, and TNXB involved in high blood pressure traits (SBP, DBP, and hypertension) in Supplementary Figure . 1

Making variant collection
After specifying genes, 65 variants, including three previously reported tag SNPs (ZBED9: rs450630, TNXB : rs2021783, AGT: rs2493134), were extracted in the TCGS database.Next, variants in the selected regions that were in low to moderate LD (r 2 ≤0.4) with the tag SNPs were extracted(N=13).The reliefF algorithm, as a machine learning approach for classi cation and feature selection, was applied to extract the most signi cant variants to reduce the computational expenses (20).The nal set consisted of 11 SNPs, including four variants located in TNXB and AGT and three in ZBED9.Supplementary Table .1 showes the information of genetic variants in ZBED9, AGT, and TNXB and their LDs.Figure 1 depicts the three genes.As illustrated in the gure, the ZBED9 gene contains two polymorphisms in exons 3 and 4, and one polymorphism in intron region 2. One polymorphism is located in exon 21 and the other three are located in the introns of the TNXB gene.Additionally, the AGT gene contains a variant in exon 2 and others in the intronic regions, and the interaction of these polymorphisms results in a signi cant association with hypertension.

Single Locus Analysis
The generalized Linear Model (GLM) was employed in four different genetic models to determine the optimal genetic model for every single variant and extract the effects of causal variants.These models included additive, dominant, recessive, and co-dominant models.To investigate the effect size of rare variants (MAF < 0.05), Firth's Bias-reduced logistic regression was used by the R logistf package (21).Coding labels for the additive model were de ned based on the minor allele count, as BB=0, AB=1, and AA=2.Dominant, recessive and co-dominant models were de ned as AA + AB versus BB, AA versus AB+BB and AB versus AA+BB genotypes, respectively.

Epistasis and Local Interactions
Model-based Multifactor Dimensionality reduction (MB-MDR) has been proposed by Calle et al. as a semiparametric machine learning dimension-reduction algorithm.MB-MDR incorporates three main steps, (i) constructing two-way contingency tables for the combination of SNPs (p-locus model) and creating a new categorical variable X. (ii) computing the wald statistic for the testing association of each X variable and the trait.(iii) conducting a permutation test to prevent in ating alpha error rate (Calle et al. 2008).In the rst step, two-way contingency tables are constructed for each SNP combination (p-locus model).Each cell in the contingency table is considered as a "Low Risk", "High-Risk", or "No-Effect" category, based on the student's t-test statistic, to test if the mean difference of a cell with the rest of the cells is either signi cant or non-signi cant.If the difference between cell means is signi cant at P-value=0.10, the cell will be labelled as either "Low Risk" or "High-Risk", based on the t statistic's sign; otherwise, it will be considered as "No-Effect".If the sign of the t statistic is positive, then the corresponding cell will be labelled as "High-Risk"; otherwise, it is "Low Risk".Thus, a new "categorical variable X= {"No effect", "Low effect", "High effect"}" will be created in the rst step of Mb-MDR.The Wald test statistic is conducted between the categorical variable X and the trait in the second step.We considered the maximum of "wald-low effect" and "wald-high effect", which are statistics for the mean differences of the "Low effect" and "High effect" category with two other categories, respectively.To avoid increasing type I error, a permutation test and bootstrapping as cross-validation tests are being assessed for evaluating the signi cance of the maximum of "wald-low effect" and "wald-high effect" for each p-locus model in the third step (Calle et al. 2008).
The current study employed this method to detect antagonistic and synergistic interactions within (local) and between loci of ZBED9 with AGT and TNXB.All interaction effects were calculated by the "mbmdr" package in R version 4.0.3software, and P-values were adjusted using the Bonferroni correction method.Multiple logistic regression was employed to capture the corresponding signi cant genotypes for each interaction model, both synergistic and antagonistic interactions.To obtain an entropy-based interaction graph and visualize pairwise interactions between variants, mdr 3.0.2software was used.To do so, independent effects of variants and their pairwise interaction effects were assessed and visualized using IGENT, entropy-based gene-gene interaction method for genome-wide interaction analysis.Fully description of the IGENT is described elsewhere (23).

Investigation of the missense variations affect the proteins
We applied the bioinformatics analysis to investigate the effects of the missense variation on the structure of the proteins coded by our interesting genes.The ZBED9 is an intrinsically disordered protein (IDP) [1,2].In the free and native conditions, IDPs have no single well-de ned tertiary structure.Therefore, the bioinformatics tools were applied to identify disordered regions and disordered binding regions of this protein.IUPred2A [3,4] is a valid and user-friendly webserver to Predict our desired information.The sequence of ZBED9 was obtained from NCBI with the protein Accession Q6R2W3.1.The 6i3f PDB le [5] of the RCSB database was used as the 3D structure of the AGT protein.The visualization of PDB les has been done by Chimera [6].

Basic description of genotyped samples
Supplementary Table. 2 presents the distribution of demographic and clinical characteristics of 4214 independent participants by sex with a similar distribution of age and signi cant differences in BMI, SBP, DBP, and HTN.Two variants (rs3130287(C>T) and rs11969759(C>T) were excluded by reliefF algorithm, and a set of intron, UTR and missense variants on ZBED9, AGT and TNXB were considered in the nal analysis.The majority of variants were in low to moderate LD (r 2 : 0-34%).Except for rs2021783 on AGT and rs449074 on ZBED9, remained variants were common with MAF>5%.

Epistasis analysis results
Table presents the epistasis models of genetic variants in low LD (r 2 < 0.1) on HTN, SBP, and DBP by Mb-MDR.Accordingly, rs450630 on ZBED9 has signi cant interactions with different variants from AGT and TNXB in various combinations.Except for those containing rs2239689 on TNXB, the antagonistic combinations including, three-locus and four-locus, had heterozygous genotypes (Supplementary Table 3).*: Effect size for HTN is de ned as odds ratio, and for SBP/DBP is de ned as regression coe cient.

Entropy-based Interaction
Figure 2-A displays the entropy-based radial interactions of a combination of 11 biomarkers.The interaction effects are categorized by colours, as yellow indicates no-epistasis, green is for redundancy, blue is for the antagonistic association, and red is for the synergistic association.The joint effect sizes are written on connecting lines.After removing three variants with no effect on HTN, the interaction effects of 8 variants were depicted in Figure 2-B.As the main effects on the diagram shows, ZBED9: rs450630 (0.36%) and AGT: rs4762 (0.11%) had the largest antagonistic effect sizes on HTN.

Impact of genetic variants on ACE2 and TMPRSS2
Firstly, the sequence of ZBED9 was analyzed.Figure 1 shows the disordered regions and disordered binding regions of this protein.rs450630 shows the ASP 686>GLU, the residue 686 of ZBED9 belong to a region with the score ~0.5 for binding.

Discussion
The purpose of this study was to discover the interaction patterns involved between recently reported associated genes to unravel susceptible multi-locus in the regulation of blood pressure.In the rst step, the integrated effect of ZBED9, AGT, and TNXB was assessed using a SKAT-O.Next, eleven variants were picked based on an LD threshold and relifF algorithm.The independent effect of each variant was assessed by single-locus analysis, in which ZBED9: rs450630 and AGT: rs4762 were associated with hypertension.Finally, Mb-MDR and entropy-based methods were considered to draw the epistasis patterns between and within AGT, ZBED9, and TNXB.The Mb-MDR algorithm showed the signi cant synergistic and antagonistic effects of multi-locus local interactions for ZBED9 and epistasis effects between AGT, ZBED9, and TNXB .
As a complex disease, high blood pressure stems from the interactions between multiple loci as well as environmental factors (24).Epistasis analysis supports joint effects of previous ndings on the associations of ZBED9 in GWAS and AGT and TNXB in linkage analysis with high blood pressure events in the Iranian population.Several studies have reported that interaction may unravel the main effects of genetic variants that are not associated in the single-locus analysis (25,26).Further, epistasis effects may explain genetic contribution on high blood pressure somewhat and disclose the mystery behind missing heritability (27).
Our results contribute to previous research on hypertension, where the protective effect of rs450630 (G>A) was reported in patients with hypertension.Results on independent effects of ZBED9: rs450630 showed that this variant exhibits a strong association with hypertension, and the results of epistasis and local interaction exacerbate this association, in which all the interaction combinations, whether with a homozygous or heterozygous genotype of other variants, were determined as antagonistic if ZBED9: rs450630 has heterozygous genotype.Hence, it appears that ZBED9: rs450630 has a deterministic role in the interaction patterns of the genetic variants on hypertension.
As mentioned above, Mb-MDR is a semiparametric machine learning dimension reduction algorithm.It is clear that Mb-MDR is a genotype-combination-based interaction testing method in which each p-locus model works with all combinations of genotype levels.This feature, along with the resampling-based approach, makes the Mb-MDR be less conservative than regression-based methods likes the Haseman-Elston regression (28).In the Haseman-Elston regression method, the interaction term is treated as a multiplication of SNPs scores in the model, while some information is likely missed.Mb-MDR discovers signi cant SNP×SNP and gene×gene interactions with disease susceptibility in both case-control and quantitative association studies in unrelated individuals.Furthermore, it has been reported that it gains high power in the presence of genetic heterogeneity, genotyping error, missing data and low sample size (29).
A family-based genome-wide linkage study of blood pressure traits (SBP, DBP, and THN) was conducted via a two-level Haseman-Elston regression model on the TCGS dataset (9).In this study, in addition to the signi cant main effects of genetic variants on ZBED9, the epistasis between AGT and TNXB was also detected.However, our research uncovered some other signi cant interactions through checking for different patterns of genotypes on ZBED9, AGT and TNXB .

Conclusion
We investigated the interplay between and within approved signi cant genes associated with high blood pressure traits (SBP, DBP, and THN) to determine the involved interaction patterns.To achieve this goal, following assessment of the aggregated effect of the ZBED9, AGT, and TNXB genes, making the collection of relevant variants based on LD threshold and relifF algorithm, and testing single SNP effects, the local interactions and epistasis were tested via two methods; Mb-MDR, and entropy-based methods.Compared to previously utilized models, two-level Haseman-Elston regression, on the Iranian population, our strategy provided stronger evidence for the existence of interaction and epistasis affecting high blood pressure traits.This diagram depicts the three genes ZBED9, TNXB, and AGT, as well as the locations of 11 signi cant polymorphisms in these genes on chromosomes 1 and 6.As illustrated in the gure, the ZBED9 gene contains two polymorphisms in exons 3 and 4, and one polymorphism in intron region 2. One polymorphism is located in exon 21 and the other three are located in the introns of the TNXB gene.Additionally, the AGT gene contains a variant in exon 2 and others in the intronic regions, and the interaction of these polymorphisms results in a signi cant association with hypertension.

Declarations
Independent effects of each are written on each node.Interaction effects are de ned by colors; Yellow is indicative of noepistasis, green is for redundancy, blue is for the antagonistic association, and red is for the synergistic association.The strength of the : Conceptualization, Writing -Original Draft, Writing-Reviewing and Editing, Software, Formal analysis.P.R: Software, Formal analysis, Writing -Original Draft.G.K and S.S: Validation.L.NHB and NA: Visualization.M.M: Phenotype Validation.H.L: Bioinformatic analysis.F.A: Supervision.M.SD: Supervision, Writing-Reviewing, and Editing.All authors read and approved the nal manuscript.Corresponding author Correspondence to Maryam S Daneshpour.All parts of this research work, design of the study, data collection, analysis, interpretation of data, and writing the manuscript, were funded by the Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.The funding body played no role in publication costs and also recognizes the scienti c support of deCODE (Reykjavik, Iceland).

Figures Figure 1
Figures This research was approved by the Medical Ethics Committee of Shahid Beheshti University of Medical Sciences and the National Committee for Ethics in Biomedical Research in May 2021 with code IR.SBMU.ENDOCRINE.REC.1400.024.All participants gave written informed consent to participate.The research has been performed in accordance with the Declaration of Helsinki.

Table 2
Epistatic (gene×gene) Results of the Mb-MDR model