Integration of Imaging Genomics Data for the Study of Alzheimer's Disease Using Joint-Connectivity-Based Sparse Nonnegative Matrix Factorization

Imaging genetics reveals the connection between microscopic genetics and macroscopic imaging, enabling the identification of disease biomarkers. In this work, we make full use of prior knowledge that has significant reference value for investigating the correlation between the brain and genetics to explore more biologically substantial biomarkers. In this paper, we propose joint-connectivity-based sparse nonnegative matrix factorization (JCB-SNMF). The algorithm simultaneously projects structural magnetic resonance imaging (sMRI), single-nucleotide polymorphism sites (SNPs), and gene expression data onto a common feature space, where heterogeneous variables with large coefficients in the same projection direction form a common module. In addition, the connectivity information for each region of the brain and genetic data are added as prior knowledge to identify regions of interest (ROIs), SNPs, and gene-related risks related to Alzheimer's disease (AD) patients. GraphNet regularization increases the anti-noise performance of the algorithm and the biological interpretability of the results. The simulation results show that compared with other NMF-based algorithms (JNMF, JSNMNMF), JCB-SNMF has better anti-noise performance and can identify and predict biomarkers closely related to AD from significant modules. By constructing a protein–protein interaction (PPI) network, we identified SF3B1, RPS20, and RBM14 as potential biomarkers of AD. We also found some significant SNP-ROI and gene–ROI pairs. Among them, two SNPs rs4472239 and rs11918049 and three genes KLHL8, ZC3H11A, and OSGEPL1 may have effects on the gray matter volume of multiple brain regions. This model provides a new way to further integrate multimodal impact genetic data to identify complex disease association patterns.


Introduction
Imaging genetics has been widely used in neurodegenerative diseases. It can explore the influence of genes on brain structure and function and use brain imaging to evaluate the impact of genes on individuals. Recently, it has made significant progress in studying the pathogenesis of Alzheimer's disease (AD) and mining AD-related biomarkers.
Canonical correlation analysis (CCA) (Parkhomenko et al. 2009) is an effective method that integrates two or more different modal data types. It can maximize the linear combination of the most remarkable correlation among different types of variables and then obtain the interrelated data components. However, because of the high-dimensional characteristics of imaging genetics, accurate association analysis of the different modal data is challenging in the case of limited samples. Many scholars have added various sparse constraints based on CCA to avoid overfitting in the model (Du et al. 2020(Du et al. , 2016Yan et al. 2014).
into two matrices: the base matrix W and the coefficient matrix H. In the early days, NMF was often used to integrate multi-omics data to reveal the hidden patterns and biological meanings in multi-omics data. Zhang et al. proposed the joint NMF algorithm (JNMF) to extract common miRNAgene-methylation data modules (Zhang, et al. 2012). Considering the correlation between different multimodal data, Zhang et al. proposed the joint sparse network regularization constraint NMF (JSNMNMF) (Zhang et al. 2011). They added the adjacency matrix among the data as prior knowledge to the JNMF model, making the results more biologically interpretable. In recent years, NMF and its various optimization algorithms have begun to play a role in the field of imaging genetics. Deng et al. used JSNMNMF to identify a competing endogenous RNA (ceRNA) co-module. They recently added orthogonal constraints to the JSNMNMF and applied it to soft tissue sarcoma research (Deng et al. 2018). However, the study used only one type of imaging data and one type of genetics data and did not integrate additional omics data to conduct in-depth research on the complex mechanisms of the disease. Wang et al. added group-level structure information in the data set to JNMF, proposed group sparse joint nonnegative matrix factorization (GSJNMF), and applied it to schizophrenia (Deng et al. 2020). On this basis, Peng et al. proposed group sparse joint nonnegative matrix factorization on orthogonal subspace (GJNMFO), which simultaneously performs orthogonal sparse constraint decomposition in the matrix and projects multimodal data into a low-dimensional orthogonal space . The above two algorithms add group sparse information to NMF to identify hidden dependent structures among different data. However, they ignore the brain connection information. In our previous research, the multi-constrained joint nonnegative matrix factorization (MCJNMF) was proposed to integrate fluorodeoxyglucose positron emission tomography (FDG-PET) imaging and DNA methylation data functions. It successfully verified the recognition of a soft tissue sarcoma (STS) lung metastasis module (Deng et al. 2020). Then, to further explore the relationship between histopathological imaging and genomic data, we proposed the multidimensional constrained joint nonnegative matrix factorization (MDJNMF) (Deng et al. 2021). This method effectively identified the biological function modules related to sarcoma or lung metastasis and revealed a significant correlation between imaging features and genetic variation features.
To our knowledge, most imaging genetics research has focused on the correlation analysis between brain imaging and single-nucleotide polymorphisms (SNPs). However, gene expression data can be used to identify disease-causing genes significantly related to AD, which is of great significance for studying the complex mechanisms of diseases. In addition, brain connectivity information can fully reflect the degree of degeneration and randomization of the brain structure of AD patients. It can explore highly relevant weighting networks of regions of interest (ROIs), SNPs, and genes that can detect sets of risk SNPs and genes positively correlated with AD. Therefore, brain and genetic connectivity information is taken into consideration as the prior knowledge in our study. The ability to detect the relationship between gray matter volume changes and genetic information variations and to explore a set of highly correlated ROIs/SNPs/ genes closely related to AD/mild cognitive impairment (MCI) is of great significance. To integrate structural magnetic resonance imaging (sMRI), SNP, and gene expression data in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http:// ADNI. loni. usc. edu/), we propose an improved joint negative matrix factorization (NMF) method, named joint-connectivitybased sparse nonnegative matrix factorization (JCB-SNMF). Considering the correlation and the connectivity information in the biological network (including genetics and imaging data), we use the adjacency matrix and the GraphNet regularizer (Grosenick et al. 2013) as the network regularization constraints to improve the accuracy and noise immunity of the algorithm, respectively. This method is a modified version of elastic net regularization, which can effectively integrate physiological constraints such as connectivity. Its stability and anti-interference were proved in the joint-connectivity-based sparse canonical correlation analysis (JCB-SCCA) algorithm (Kim et al. 2020). The GraphNet regularizer is used to implement the connectivity of the brain and genetics to the algorithm. It encourages related nodes (ROIs, SNPs, or genes) to be more similar. By adding it to the coefficient matrix H, we obtain more relevant features in the significant modules. The results show that with brain connectivity information, the top 10 pairs of SNP-ROI and gene-ROI contain multiple identical ROIs, SNPs, and genes. Through the biological analysis of the significant genes, including Gene Ontology (GO) enrichment analysis and protein-protein interaction (PPI) network construction on the selected genes, we found that the selected genes participate in the biological process, and the ROI is closely related to neurodegenerative diseases such as AD. We performed receiver operating characteristic (ROC) analysis on the most interacting genes, and found potential genes for the diagnosis of AD and MCI, including splicing factor 3b subunit 1 (SF3B1), ribosomal protein S20 (RPS20), and RNA-binding motif protein 14 (RBM14). In addition, most of the genes in which the SNPs were in the significant module were closely related to AD. We drew a heat map of the selected SNP-ROI and gene-ROI pairs and found significant relationship pairs, which may represent biomarkers of AD and MCI.

Joint Nonnegative Matrix Factorization (JNMF)
NMF is a traditional dimensionality reduction method, and its general model is as follows.
where X ∈ R n×p represents the original feature matrix, which can be decomposed into W and H by NMF. W ∈ R n×k is called the basis matrix, H ∈ R k×p is called the coefficient matrix, n is the number of samples, and p is the features of samples. NMF can sufficiently reduce the dimensionality of a single data set. However, it cannot be executed on multimodal data at the same time. Therefore, the JNMF model is proposed to solve the problem (Zhang et al. 2012), as follows: where X i ∈ R n×pi (i = 1, 2, …) represents the original matrix of different modes, W i ∈ R n×k represents the basis matrix obtained by decomposition, and H I ∈ R k×pi represents the coefficient matrix of each original matrix obtained by decomposition.

Joint Sparse Network-Regularized Multiple Nonnegative Matrix Factorization (JSNMNMF)
Considering that the coefficient matrix obtained by JNMF has strong independence, and n is often much smaller than p in practical applications, Zhang et al. (2011) proposed the JSNMNMF framework. JSNMNMF adds prior knowledge to the objective function to improve the biological relevance of the results and can also improve the efficiency of the module by reducing the larger search space. Here, to improve the weak connection between imaging and genetics, we assume that A 1 is the MRI-SNP interaction adjacency matrix, A 2 is the MRI-gene interaction adjacency matrix, and A 3 is the SNP-gene interaction adjacency matrix. In addition, to sparse the data to discover the key features of the data, JSNMNMF uses the method described by Kim and Park (2007) to control the sparsity of W and H. Therefore, its objective function is as follows.
(1) min Where the parameters 1 , 2 , and 3 are the weights for the adjacency constraints, 1 is used to limit the growth of W and 2 is used to constrain H.

Joint-Connectivity-Based Sparse Nonnegative Matrix Factorization (JCB-SNMF)
To encourage the similarity of related elements of the norm vector, the connectivity-based penalty term (Kim et al. 2020) is introduced in the JSNMNMF algorithm. Specifically, suppose the connectivity between the i-th node and the j-th node (that is, the brain region or genetic data) is high. In that case, it will force the corresponding elements of the norm vector to be similar (Grosenick et al. 2013). Therefore, we add the brain connectivity information and the weighted SNP correlation network to capture the genetic network structure (Levine et al. 1613) as a prior matrix to add to the algorithm, aiming to improve the biological significance of the extracted features.
Here, L h 1 , L h 2 , and L h 3 represent the Laplacian matrix of X 1 , X 2 , and X 3 , respectively.
We regard the Laplacian matrix of X 1 , X 2 , and X 3 as a new fusion penalty of the JSNMNMF algorithm, and the specific formula is as follows: B 1 , B 2 , and B 3 represent the Laplacian matrices of X 1 , X 2 , and X 3 , respectively.

The Efficient Optimization Algorithm
Now we can write the proposed method with penalties explicitly exhibited. (4) The partial derivatives of L with respect to W and H I are: Based on Karush-Kuhn-Tucker (KKT) conditions, ij W ij = 0 and I ij (H I ) ij = 0. We can obtain the equations of W ij and (H I ) ij : where h ij represents the element in H I , μ i represents the average value of feature j in H I , and σ i represents the standard deviation. Next, in order to determine the module membership, we set a threshold T. If its z-score is greater than the set threshold T, it is considered eligible to be assigned to the module.

Estimating the significance of the co-expression module
In this section, we use the method described by Peng et al. (2020) to evaluate the significance of each module. Specifically, we assume that and c e are column vectors selected from X 1 , X 2 , and X 3 . Then, the mean correlations * among the three types of data sets in a module can be expressed as follows: For a given matrix C Q , we randomly change the order of the row vectors of the matrices A Q and B Q in the Q-th module, and repeat this process Δ times. For each permutation, * is the new mean correlation coefficient calculated by (10) after the row of permutation matrix A Q and B Q . The significance of the test statistic can be estimated as follows: where |.| denotes the number of times * ≥ * . If the p-value is less than 0.05, we consider the module significant.

Data source and preprocessing
In this section, we evaluate the effectiveness of the proposed method on the ADNI database. In this database, we examine and select candidate SNPs to predict the sMRI phenotype reaction. As can be seen from Table 1, there were 180 non-Hispanic Caucasian participants with imaging and genotyping data, including 21 healthy controls (HC), 147 with MCI, and 12 participants with AD.

Selection of module elements
Through the iterative update of the above algorithm, we finally decompose the feature matrix X 1 , X 2 , and X 3 of MRI, SNP, and gene into base matrix W and coefficient matrix H 1 , H 2 , and H 3 . In order to find the weight value corresponding to the distinctive feature of each row of W, we use the z-score to extract the coefficient of each row of the H I matrix. It is defined as follows: The original sMRI data were downloaded from ADNI1 in the experiment, which used DiffusionKit (Gorski et al. 2007) software to achieve head movement correction. It was registered to the Montreal Neurological Institute (MNI) standard space. Next, the segmentation of sMRI was implemented using the MATLAB software CAT toolkit in the SPM software package (Saykin et al. 2010). Specifically, voxel-based morphometry (VBM) provides voxel estimation of the local number or volume of specific tissue compartments. By scaling the volume change due to spatial registration to adjust the segmentation, the volume of gray matter tissue was calculated in the ROI as a feature. After screening, in the end, 140 ROIs were retained.
The genotypes of 180 subjects in this study came from the ADNI1 database. The genome-wide SNP sites of samples were screened through the following steps. All SNPs were genotyped by the human 610-Quad BeadChip in the study. We used the PLINK genetic analysis tool (Purcell et al. 2007) to screen genotype data, using the following exclusion criteria: rare SNPs (minor allele frequency [MAF] < 0.05), violations of Hardy-Weinberg equilibrium (HWE p < 10-6), poor call rate (< 90%) per subject and per SNP marker, gender check, and sibling pair identification. This resulted in a final data set spanning 5947 SNP loci.
We use the limma package (Ritchie et al. 2015) to screen for genes with significant differential expression. Those genes were removed when the p-value was greater than 0.01, resulting in 1477 genes obtained.
For the feature matrices of the three original data sets obtained by the above processing, we used the L2 norm to normalize the data to ensure the non-negativity of the input data. Specifically, since NMF is an unsupervised clustering method, we only put three feature matrices composed of samples from the AD group and the MCI group into the algorithm. The samples of the HC group were used as controls during the preprocessing of SNP and gene expression data. Then we spliced the feature matrices of the AD group and MCI group samples together. Finally, we obtained three characteristic matrices X 1 ∈ R 159×140 , X 2 ∈ R 159×5947 , and X 3 ∈ R 159×1477 corresponding to sMRI, SNP, and gene expression data.

Parameter Selection
In this section, we use the proposed algorithm to do real data experiments. We apply the proposed algorithm to SNP, sMRI, and gene expression data. According to the Formulas (6) in the Method section, we need to determine the optimal i , i (i = 1,2,3), 1 , and 2 for the proposed algorithm. In the experiment, we randomly initialize a set of nonnegative W and H I (I = 1, 2, 3). First, to prove the convergence of the proposed algorithm, we randomly select three sets of parameters and iterate 100 times. Then we plot the objective function values obtained during the iteration process in Fig. 1. It can be seen from Fig. 1 that the objective function value of the proposed algorithm is greatly affected by parameter adjustment. However, a blind grid search of parameters is highly time-consuming. Thus, we fix the value of K to 10 and tune i , i (i = 1,2,3), 1 , and 2 from the following finite set: [0.0001 0.001, 0.01, 0.1, 1, 10]. Figure 2 shows the reconstruction errors obtained by sequentially substituting 1296 sets of regularization parameters into the algorithm. We use the set of regularization parameters corresponding to the minimum reconstruction error for further analysis.
As shown in Fig. 2, when taking the 866th group of regularization parameters, the reconstruction error is the smallest. Parameters in group 866 are i = 0.001, i = 0.0001 (i = 1,2,3), 1 = 0.1, and 2 = 0.0001. Then, we fix i , i , 1 , and 2 and use the same method to determine K. The algorithm based on NMF requires K < < n. We set the upper limit of K to 50. The proposed model reconstructs data with different dimensionality, as shown in Fig. 3. When K increases, the reconstruction error will gradually decrease.
However, when K is less than 6, we find that some common modules no longer contain any elements. Therefore, a K value of between 6 and 50 is chosen. When the K value is too large, it will cause changes in the overlap rate among modules, which is not conducive to subsequent analysis. Specifically, different K values will affect the Pearson correlation coefficients between the original and decomposed matrices and the element overlap ratio among the modules. Therefore, we compared the two indicators under different K values.
Specifically, we used the overlap rate calculation method consistent with previous research (Deng et al. 2021). The overlap rate is defined as O = length ( insert (R i ,R j )) min(length(R i ),length(R j )) , where R i and R j represent the feature vectors of the i-th and j-th results, respectively, length() represents the number of feature vectors, and insert() represents the intersection of the i-th result and the j-th result. Figure 4A-C shows the overlap rate of ROI, SNP, and gene modules, respectively, under different K values. As the K value increases, the overlap rate of members in the ROI and gene modules gradually decreases. However, the overlap rate of members in the SNP modules gradually increases. Figure 4D-F shows the respective Pearson correlation coefficients between X i and W *H i (i = 1,2,3,…) under different K values. As the value of K   increases, the Pearson correlation coefficients between X 1 and X 2 and their respective reconstruction matrices gradually increase. However, the Pearson correlation coefficient between X 3 and its reconstruction matrix gradually decreases. When the number of modules is about 30, it can be considered as a balance point based on Fig. 4A-F. Based on the above considerations, we set K to 30.
To prevent the objective function from falling into a local minimum, we repeated the whole procedure 100 times with different initialization values. When the objective function took the minimum value, the corresponding W, H 1 , H 2 , and H 3 were used for further analysis. We have drawn the objective values as shown in Fig. 5. Finally, we selected the 18th set of initialization values. In addition, we added two indicators previously selected for K to prove the algorithm's multiple experiment replication rate. As shown in Fig. 6, as the initial value increases, the overlap ratio of the three module members and the Pearson correlation coefficients of the three matrices and their respective reconstruction matrices are relatively stable. The average overlap ratios of the three module members are 0.7860, 0.6776, and 0.7443, respectively. This shows that the proposed algorithm is robust and reproducible.

Results on the ADNI database
We conducted simulation experiments on the ADNI database preprocessed above. The Pearson correlation coefficients of the original matrix and the reconstructed matrix of ROIs, SNPs, and genes are 0.9962, 0.5534, and 0.9921, respectively. Thirty co-expression modules were also obtained. In addition, we calculated the computational cost of the three algorithms. We ran the experiments of real data sets on a machine with an Intel Core i5-8300H CPU with 16 GB RAM and used MATLAB (R2017a) 64-bit for the general implementation. The computational costs of the proposed algorithm, JSNMNMF, and JNMF were 19.02 s, 13.46 s, 10.23 s, respectively.
In order to verify the correlation analysis capability of the proposed algorithm, we performed KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis on the gene expression data in all modules. We extracted the seven most enriched biological process keywords and calculated the number of keywords involved in each module. It can be seen from Fig. 7 that there are 19 modules involving all seven keywords. Toll-like receptors can initiate pro-inflammatory immune responses by activating NF-κB and other transcription factors that cause the synthesis of pro-inflammatory molecules and play an important role in diseases related to neuroinflammation (AD, Parkinson's   Mahmoudvand et al. (2016) showed that immune cells in mice infected with toxoplasmosis promoted neuroinflammation through a cytokine network and enhanced cognitive impairment in AD mice. Influenza A is also closely related to AD. The accumulation of β-amyloid (βA) can cause aggravation of AD, but it also inhibits the influenza A virus (White et al. 2014). Therefore, the proposed algorithm can effectively select diseaserelated modules, and different modules have a certain degree of representation. We used Formula (11) in the Method section to select the five modules with p < 0.05, as shown in Table 2 Among these, Modules 6 and 10 contain SNPs that exceed 60% of the total number of SNPs and are not considered. We draw a Venn diagram for the three data features in the other modules and compare the escape rates of the three features in different modules. As shown in Fig. 8, the escape rate of Module 1 is the smallest among the three types of data, so we choose Module 1 for further analysis. Table 3 lists the ROIs and genes identified by the proposed algorithm from the SNP loci. As can be seen from Table 3, the cerebral white matter (Nasrabady et al. 2018), angular gyrus (Carbonell et al. 2014  precentral gyrus (Willette et al. 2015), supramarginal gyrus (Redolfi et al. 2015), superior parietal lobule (Yamashita et al. 2014), and superior temporal gyrus (Ramos et al. 2015) are confirmed to be risk brain regions for AD. Also, we identify a total of 47 risk genes from the SNP data in the first module. Studies (Lu et al. 2016;Sherva et al. 2014) have found that DHX57 and SPON1 associated with accelerated cerebellar age significantly affect AD. In another two imaging genetics studies, EOMES and RGS6 were also identified as risk genes (Khondoker et al. 2015;Moon et al. 2015). The unregulated expression of PPP2R2C may be related to the onset of AD (Leong et al. 2020). The amyloid precursor protein (APP) plays a central role in AD, and CASK is an interactor of the APP intracellular domain (AICD) (Silva et al. 2020). Maphis et al. (2017) confirmed that PCSK2 was upregulated as a differential gene in the hippocampus in a mouse model of tauopathy. CNTNAP5 belongs to the contactin-associated protein (Caspr) family and is related to various neurodegenerative diseases (Zou et al. 2017). SLC9A9 is also associated with neuropsychiatric diseases (Patak et al. 2017). We found several SNP sites related to dementia and neurological diseases in Module 1. Frailty is a complex phenotype of aging. One study found that the prevalence of haplotypes of risk alleles on rs1324192 was significantly higher than that in non-vulnerable older adults (Sathyan et al. 2018). The intron SNP rs3802890 was shown to be associated with female autism characteristics (Mitjans et al. 2017). Koga et al. (1996) found that rs981975 was the SNP most highly correlated with changes in anti-schizophrenia dose. Using machine learning, Nguyen et al. (2015) predicted that rs12185438 was highly related to Parkinson's syndrome.

Biological significance
A total of 258 genes were selected from the gene expression data set. As shown in Fig. 9, we performed GO enrichment analysis on these genes. Frost (Frost 2016) confirmed that dysfunction of the nucleoskeleton is a causal factor for Alzheimer's disease-related neurodegeneration. Wheeler et al. (2019) confirmed that activity of the poly(A)-binding protein MSUT2 determines susceptibility to pathological tau in the mammalian brain. Most of the other enriched biological processes are also closely related to AD. We also constructed a PPI network for the genes selected in Module 1. As can be seen in Fig. 10, we retained 177 genes with interaction relationships. We selected 10 genes with the largest area (strongest interaction), namely SF3B1, BPTF, TLR4, MTOR, LSM3,SRRM1, PCBP1, HNRNPA3, RPS20, and RBM14. Among these, Huang et al. (2017) confirmed that in the initial stage, the activation of TLR4 has an effective clearance effect on amyloid β (Aβ). However, the long-term activation state will cause Aβ to be deposited in the brain. Kou et al. (2019) noted that AD can be prevented and treated by regulating the mTOR signaling pathway. Tao et al. (2020) confirmed that LSM3 is the main pathogenic gene of AD Fig. 9 The results of GO enrichment analysis for the genes selected in Module 1. The horizontal axis is the number of genes in the pathway, the vertical axis is the pathway list, and the red to blue color indicates p-values ranging from 0 to 10 and the core gene in the module network closely related to MCI and AD. The biological functions of HNRNPA3 products are related to inflammatory and neurodegenerative diseases (Cruz-Rivera et al. 2018). Those significant genes were investigated by receiver operating characteristic (ROC) curve analysis using IBM SPSS Statistics 22. We show area under the curve (AUC) values greater than 0.5 and p-values less than 0.05 in Fig. 11 and Table 4. In Fig. 10, we show the ROC curves of three significant genes. In Table 4, we show the detailed information of the ROC curves of the three genes. There are four genes with AUC values greater than 0.6 and p-values less than 0.01. The highest AUC was found for RBM14 (AUC: 0.691, 95% CI: 0.577-0.805, p = 0.007), followed by SF3B1 (AUC: 0.656, 95% CI: 0.525-0.788, p = 0.026) and RPS20 (AUC: 0.641, 95% CI: 0.504-0.777, p = 0.045). Therefore, the proposed algorithm confirmed several AD and MCI risk genes in Module 1 and found three genes potentially related to AD and MCI.  Figure 12 shows the pairwise correlation heat map of brain ROI-SNP/gene pairs according to the SNPs, genes, and brain ROIs selected in Module 1. As expected, most ROI-SNP/gene pairs are strong. To find significantly stronger SNP/gene-ROI pairs, we show the top 10 pairs with p < 0.01 in Module 1 from Table 5. Although no SNP in the SNP-ROI pair has yet been reported, because most of the ROI has been confirmed to be the risk brain area of AD/ MCI, the obtained SNP-ROI pair still has a specific reference value. Combining our conclusions and reports in the literature (Nasrabady et al. 2018), the variation of rs4472239 may play an important role in the abnormality of cerebral white matter, thereby promoting the development of MCI and AD. rs11918049 is significantly associated with multiple frontal regions and the central anterior gyrus. The genes in the top 10 pairs of gene-ROI are concentrated in KLHL8, ZC3H11A, and OSGEPL1. Based on the typology described in the literature (Donlon and Morris 2019), ZC3H11A can cause immune system disorders related to aging. Abel et al. (2020) showed that there was no significant change in the expression of the KEOPS complex (including OSGEPL1) in the brains of patients with depression and schizophrenia, but whether functional changes in the KEOPS complex are

Comparison of anti-noise performance with other algorithms
Here, we used the Laplacian matrix as a new penalty item to process the image and genetic data. It can improve stability and anti-noise performance. We conducted experiments on two data sets with larger and smaller sample sizes and features, respectively. We used n to represent the number of samples, p1 to represent sMRI features, q1 to represent SNP features, and q2 to represent gene features. In the large set, we set n = 300, p1 = 2000, q1 = 2500, and q2 = 1000, respectively. In the small set, we set n = 100, p1 = 650, q1 = 350, and q2 = 600, respectively. In addition, we set K = 10. The elements in the base matrix W and the coefficient matrix H 1 , H 2 , and H 3 are all random integers. W, H 1 , H 2 , and H 3 are generated by the following equation (Peng et al. 2020).
where is a matrix consisting of random integers from a uniform distribution U(1, 10) . Then, we used i to represent the Gauss noise and l to represent the noise level.
It can be seen from Fig. 13 that in the process of increasing noise in the two data sets, the reconstruction error and objective function value of the proposed algorithm were both smaller than JNMF and JSNMNMF. In addition, we calculated the computational cost of the three algorithms. We ran the experiments of synthetic data sets on a machine with an Intel Core i5-8300H CPU with 16 GB RAM and used MATLAB (R2017a) 64-bit for the general implementation. In the small set, the computational costs of the proposed algorithm, JSNMNMF, and JNMF were 7.49 s, 7.33 s, [n] = i | | i = + l i (i = 1,2, ⋯ , n) and 7.28 s, respectively. In the large set, the computational costs of the proposed algorithm, JSNMNMF, and JNMF were 16.53 s, 15.14 s, and 12.49 s, respectively.

Conclusion
NMF is a robust dimensionality reduction analysis algorithm that can integrate multiple omics data. Therefore, introducing it into imaging genetics can effectively integrate the macroscopic and microscopic characteristics of the disease and then mine the biomarkers closely related to the disease from the significant co-expression modules of the characteristics. In this paper, we proposed JCB-SNMF. This method considers the respective connectivity information of the brain and genes. Furthermore, it adds the Laplacian matrix as prior knowledge to the JSNMNMF algorithm, which improves the algorithm's anti-noise performance and biological interpretability. The proposed algorithm uses sMRI, SNP, and gene expression data in the ADNI data set. Simulation results show that the noise resistance performance of the proposed algorithm is better than that of JSNMNMF and JNMF. Experimental results on real data show that the proposed algorithm can identify and predict risk ROIs, risk SNPs, and risk genes closely related to AD and MCI. Moreover, we also found some significant SNP/gene-ROI pairs. For example, rs11918049 is related to multiple frontal regions and the central anterior gyrus. In addition, ZC3H11A and OSGEPL1 are closely related to changes in gray matter volume in multiple brain regions. In the future, we will integrate multimodal imaging (PET, CT, etc.) and genetic data (DNA methylation, etc.) to discover more complex biological mechanisms closely related to diseases.