Data processing
The Gene Expression Omnibus database (GEO https://www.ncbi.nlm.nih.gov/geo/) contains sequencing data submitted by scientists globally and three AD-related datasets were selected in this study for research and validation. Among these datasets, GSE5281 was based on the GPL570 platform. This dataset includes brain tissue samples from 87 AD patients and 74 healthy individuals and is known as set 1. This set was used to validate the expression of DEGs in AD patients. The median expression level of STX17 in GSE5281 was used to form set 2 and used for screening of DEGs with significant correlation in STX17 expression. In this study, hippocampal tissue sequencing samples in GSE48350 were used to form set 3. This set includes brain tissue samples from 18 AD patients and 43 healthy individuals and the GPL570 platform was used to construct a co-expression network. The top 75% of genes by median absolute deviation were selected from the dataset. GSE33000 was based on the GPL4372 platform. This dataset includes brain tissue samples from 310 AD patients and 157 healthy individuals and was used to validate the fit between STX17 and its related genes in AD. The “justRMA” function in “Affy” was used for normalization of GSE5281 and GSE48350. The “limma” package was used for background correction and standardization of the GSE33000 dataset. HAMdb (http://hamdb.scbdd.com/) contains 797 autophagy-related genes. The datasets used in this study were obtained from the public access GEO database and ethics review and approval were not required.
Identification of DEGs
In this study, the GSE5281 dataset was selected and probes were converted to gene symbols for subsequent study. The “limma” package in R 4.0.3 was used to screen for differential expression results between AD and healthy tissues in set 1. “limma” was used to screen for differential expression results between STX17 high expression and STX17 low expression in set 2. The screening criteria were: corrected P-value (False discovery rate, FDR) ≤ 0.05 and log2FC ≤ -0.4 or log2FC ≥ 0.4 where FC is fold change. “pheatmap” in R software was used to plot differential heat maps. “ggplot2” in R software was used to plot volcano plots of DEGs.
Experimental animals
Nine-month-old male APP/PS1 mice weighing 20 ± 3 g were housed under suitable temperature and humidity and given ad libitum access to food and water. All animal experiments in this study were approved by the Ethics Committee of Shanxi Medical University and conformed to the National Experimental Animal Usage Regulations.
Primary reagents used in animal experiments
STX17 antibody (Proteintech, USA, 17815-1-P), GAPDH antibody (Beijing Zhongshan Golden Bridge Biotechnology, China, bsm-33033M), β-actin antibody (Beijing Zhongshan Golden Bridge Biotechnology, China, TA-09), horseradish peroxidase-labeled goat anti-mouse IgG (Beijing Zhongshan Golden Bridge Biotechnology, China, ZB-2305), SDS-PAGE horseradish peroxidase-labeled goat anti-rabbit IgG (Beijing Zhongshan Golden Bridge Biotechnology, China, ZB-2301), Prime Script RT Master Mix (TaKaRa, Japan, RR036A), and SYBR Premix Ex TaqTM II (TaKaRa, Japan, RR820A) were used.
Main equipment
Microplate reader (SoftMax, USA, SMP500-071 47-HLXU), low temperature high-speed centrifuge (Thermo Fisher, USA, LR56495), PCR machine (Stratagene, USA, MX3005P); thermal cycler (MJ Research, USA, MX3005P), vertical electrophoresis tank (BIO-RAD, USA, 042BR11805), semi-dry membrane transfer cell (BIO-RAD, USA, 221BR22693), and gel imaging system (UVP, USA, BioSpectrum 810) were used.
Western blot
BSA was used to measure protein concentration in the extracted tissue supernatant. After SDS-PAGE was used to resolve proteins, they were transferred to a PVDF membrane and blocked with 5% skimmed milk at room temperature for 2 h. After that, TBST was used to wash the membranes three times before primary antibodies were added and incubated at 4°C overnight. On the second day, the membranes were washed with TBST three times before secondary antibodies were added, and the membranes were incubated at 4°C for 2 h. The membranes were washed three times. Super ECL Plus (volume of solution A: volume solution B = 1:1) was mixed evenly before added to PVDF membranes (around 200 µl). The BioSpectrum 810 Imaging System was used to acquire images. Exposure was performed in the gel imaging system and ImageJ software was used for analysis.
Real-time PCR
An appropriate amount of hippocampal tissue was cut and 300 µl Trizol was added for sufficient lysis before homogenization (60 s, 60 Hz). The lysate was then centrifuged (12000 rpm, 4°C, 5 min) and the supernatant was collected for total RNA extraction, RNA dissolution, and concentration tests. RNA was reverse transcribed into a cDNA template and SYBR green was used to amplify the cDNA product. All results were normalized to GAPDH expression in the control group. The 2-ΔΔCt method was used to calculate the relative expression level of the target gene mRNA.
PPI network construction and core gene screening
Key entries in DEG enrichment analysis were selected and genes were extracted. The STRING Database (https://cn.string-db.org/) provided construction and correlation evaluation for protein interaction networks. The Cytoscape 3.8.2 software was used for further analysis and visualization of interaction networks. To obtain the most significant modules, the MCODE plugin in Cytoscape was used for analysis and modules with scores greater than 4 were key modules.
Construction of weighted gene co-expression networks
To better understand AD gene expression and interactions, the “WGCNA” package in R software was used for weighted co-expression analysis of the GSE48350 dataset. Hierarchical clustering accordingly found that there were no abnormalities in the samples. Following that, a scale-free network was constructed based on gene expression levels and the “pickSoftThreshold” function was used for screening with a soft threshold of β = 7. Next, an adjacency matrix was constructed to determine the correlation between genes and the adjacency matrix was transformed into a topological overlap matrix (TOM) to describe the similarity of every node. The dynamic tree cut criteria was used to divide genes with similar expression spectrum into different modules and the minimum number of genes in each module was set as 30 (MinModuleSize = 30). The modules were visualized to obtain dendograms and the correlation coefficient matrix between module eigengenes (MEs) and clinical traits and the module with the highest AD correlation was known as the optimal module. The correlation between gene significance (GS) and module membership (MM) in the module was assessed and genes with GS > 0.2 and MM > 0.8 in the module were considered central genes.
Gene functional enrichment analysis
The “clusterProfiler” package in R 4.0.3 was used for enrichment analysis, including gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. GO is composed of molecular function (MF), cellular component (CC), and biological process (BP). The “ggplot2” package in R software was used for visualization. P < 0.01 and FDR < 0.05 were used as screening criteria.
LASSO model construction and ROC curve evaluation
The LASSO regression model has good simulation and prediction capabilities. The central genes in the GSE33000 dataset and the expression spectrum of core genes after intersection of DEGs were selected and used to construct the LASSO model using the “glmnet” package in the R software. The following formula was used to calculate the expression and regression coefficients of core genes: index = ExpGene1 × Coef1 + ExpGene2 × Coef2 + ExpGene3 × Coef3 +…+ ExpGeneN × CoefN. Where “Exp” is the gene expression level and “Coef” is the regression coefficient of the gene. To accurately identify the AD and healthy groups, we randomized the datasets: training set (70%) and test set (30%). At the same time, to accurately identify the correlation between genes and STX17, the median expression level of STX17 was used as a threshold value and the GSE33000 dataset was used to construct a new LASSO regression model. The “ROCR” package in the R software was used to construct receiver operating characteristic (ROC) curves to assess the stability and fit of the LASSO model.