Patients and datasets
We used the Gene Expression Omnibus[16] (GEO, https://ncbi.nlm.mih.gov/geo/), a public functional genomics database with various gene profiles in various diseases. We retrieved multiple GEO series, including mRNA (GSE144127[17] and GSE19435[18]), miRNA (GSE179816), and lncRNA (GSE163292[18, 19]) datasets. The GPL10558 platform was used to extract the data from GSE144127 microarray profiles included 101 healthy and 327 TB samples.The GSE19435 microarray profiles were generated using the GPL6947 platform and included 45 TB and 61 matched control samples. In addition, GSE144127 was used as an external dataset, while GSE19435 was used as training datasets (details in Table 1). GenomeStudio (Illumina) was used to perform quality control and generate signal intensity values. Then, we integrated the ATGs derived from HADb (http://www.autophagy.lu/index.html)[20], AUTOPHAGY DATABASE (http://www.tanpaku.org/autophagy/index.html), HAMdb (http://hamdb.scbdd.com/)[21], GSEA (https://www.gseamsigdb.org/gsea/index.jsp) and NCBI (https://www.ncbi.nlm.nih.gov/). A total of 2323 genes were extracted, among which 1906 autophagy-related genes were expressed in the GSE144127 dataset (See Supplemental Table S1).
Weighted Gene Co-Expression Network Analysis
We constructed gene co-expression networks using the R package "WGCNA"[22] and identified key functional modules. The correlation of ATGs between samples was evaluated based on module membership values>0.8 and gene significance values>0.5. A soft threshold of 9 was determined as suitable with an R2>0.9. Subsequently, a standard scale-free network was established. Pearson correlation analysis was conducted to examine the correlation of each gene module with TB disease and health characteristics. The correlation functional network was then utilized to identify tissue-specific markers. Key modules were identified by hierarchical clustering of genes using a dynamic tree-cutting strategy (shear height = 0.25). Gene information in the modules (see Supplementary Table S2).
Enrichment Analysis of Key Modules
We used significance ratings to sort the genes in the core modules from most to least important. Based on the results of the Gene Ontology (GO) annotation analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis in the functionally related modules, functional enrichment analysis was then conducted to examine, identify, and interpret various biological functions (see Supplementary Table S3-S6). The R software's "Cluster Profiler" and "ggplot2" (p-value Cut off = 0.25) were used to view and analyze the data. The essential modules are more biologically significant and have better characterization than other modules.
Construction of protein–protein interaction network
The STRING database[23] (https://string-db.org/), a search tool for the retrieval of interacting genes/proteins, was utilized to construct the protein-protein interaction (PPI) network, encompassing both experimentally validated and computationally predicted interactions. The 508 ATGs in the key modules were imported into the STRING online platform to construct a PPI network with default Settings for PPI analysis. The analysis results were imported into Cytoscape (3.10.1) software. Using the Degree method, we used CytoHubba[24] to analyze the core module enrichment of the PPI network .
Identification of hub genes
The R tool "limma" was used to analyze the changes in gene expression of 508 ATGs in the core modules, whereas the R package "ggplot2" was employed to generate the volcano diagrams. As shown in Supplementary Table S7, we used the criteria | log2FC | > 0.5 and p< 0.05 to filter the gene expression differences. The top nodes of the PPI network were intersected with the differentially expressed genes to get the gene expression in TB. We intersected the top 20 nodes in the PPI network with DE-ATGs to discover the differentially expressed Hub genes in tuberculosis. Researchers used Receiver Operating Characteristic (ROC) curve analysis to determine which genes were most useful for diagnosis. The validation dataset (GSE19435) was subjected to ROC analysis using R's "pROC" package.
Immune cell infiltration analysis
To find immunological infiltration, the CIBERSORT[25] method was used. This deconvolution method can determine the proportion of 22 different types of immune cells using a huge number of tissue samples. Based on the gene expression data, the infiltration abundance of 22 immune cells per sample was computed using the 'CIBERSORT R script v1.04' R-based script, referenced in Supplementary Table S8. At the same time, we looked at how immune cell invasion was correlated with hub gene expression levels.
Construction of ceRNA network
We utilized the following procedures to construct the ceRNA network: (1) differential analysis of GSE163292 (lncRNA database) and GSE179816 (miRNA database) with the limma package in the R environment using | log2FC | > 0.5 and p < 0.05 as screening thresholds to obtain differentially expressed microRNAs (DE-miRNAs) and differentially expressed lncRNAs (DE-lncRNAs) (see Supplementary Table S9, S10); (2) using mirNet (https://www.mirnet.ca/)[26] to predict the lncRNAs that would be targeted by the DE-miRNAs that bind to the hub gene [the predicted lncRNAs were then taken to intersect with the DElncRNAs from (1) to construct the miRNA–lncRNA axis]; and (3) combining the multiple mRNA–miRNA and miRNA–lncRNA axes obtained from the above steps into a mRNA–miRNA– lncRNA regulatory axis according to the predicted bindsing trends (see Supplementary Table S11, S12). In addition, the ceRNA network was visualized using Cytoscape software.
Drug–Gene Interaction and Molecular Docking Analysis
The medications that are now available or might be beneficial were discovered by the Drug-Gene Interaction Database (DGIdb, www.dgidb.org)[27]. For the data visualizations made using Cytoscape, see Supplementary Table S13. The PubChem database[28] and the PDB database[29] were used to construct the molecular structures of the target proteins. Pymol[30], which can distinguish between water molecules and small-molecule ligands, was used to fine-tune the targets after the sequences had been modeled using the AlphaFold2[31] program. After inputting the protein sequences, we ran protein-protein docking using the HDOCK SERVER [32]application. Using the PyMOL tool, the docked complexes were viewed when the ideal docking arrangement was settled upon.
M. tuberculosis culture
M. tuberculosis-BCG was obtained from Shihezi University (Shihezi, China). BCG was grown in Roche medium. After 21 days, the rapidly increasing BCG vaccine was collected to make a homogeneous bacterial suspension, and the BCG concentration was determined using a McColl turbidimeter.
Cell culture and M. tuberculosis infection
Procell of Wuhan, China, supplied the macrophage RAW264.7 and mononuclear THP-1 cell lines. The 6-well plates were seeded with cells at a 2×106 cells/well density in RPMI 1640 medium (Gibco, USA) or DMEM medium (Gibco, USA) with 10% FBS (Gibco, USA) before usage. Phorbol-12-myristate-13-acetate (PMA, Sigma-Aldrich, USA) was added to THP-1 cells during 24 hours of incubation at 37 ℃ with 5% CO2 to stimulate adherent differentiation of macrophages. The optimal conditions for BCG infection were a MOI of 10 and 4 h of infection time for the cells. Bacteria outside of cells were rinsed with PBS (servicebio, China). Twenty-four hours after infection, THP-1/RAW264.7 cells were cultivated.
C57BL/6 mice infection models
We bought our male C57BL/6 mice from the Chinese company Henan Scebis Biotechnology Co., Ltd. From 230 to 250 grams, they were weighed. Injecting 5×106 BCG into the tail veins of mice served as a TB infection model(n≥3), whereas PBS was used as a control group. A standard environment was maintained for the mice, which included food and water accessible at all times, a temperature of 23 ± 1°C, 60% humidity, and a 12:12 light/dark cycle with light from 08:30 to 20:30. After four weeks of intervention, the mice were sacrificed by severing their necks. Their lung tissues were preserved for use in future studies.
Real-time quantitative polymerasechain reaction
Complete RNA extraction was performed according to the manufacturer's instructions using the TransZol Up (TransGen Biotech, China). The next phase included utilizing a PrimeScript RT Reagent Kit to reverse-transcribe the RNA into cDNA with the help of TransGen Biotech of China. Amplifying cDNA using a QuantStudio real-time PCR machine developed by Thermo Fisher Scientific in the USA. The primer sequences for STAT1, CASP1, and IL1B are shown in Table 2. To find the relative expression of mRNA, the 2−ΔŔCt method was used, using ACTB normalization.
Western Blotting and Immunohistochemistry
According to the manufacturer's recommendations, the RIPA lysis buffer (Sigma, USA) was used to extract total protein from cells and tissues. After that, the BCA assay (Beyotime, China) was used to quantify the total protein concentration. After electrophoretically transferring 30 ug of total proteins onto PVDF membranes, sealed with 5% BSA, the proteins were separated on a 10% polyacrylamide gel using SDS-PAGE. The membranes were incubated at 4 ℃ overnight with diluted primary antibodies to STAT1, IL1B, and CASP1. The dilutions were 1:1000 for CST, 1:1000 for Affinity, and 1:1000 for CASP1. After that, the membrane was left to incubate at room temperature for another four hours with secondary antibodies that were HRP-labelled (Jackson, United States). At last, the protein bands were quantified and examined with the help of ImageJ.
Statistical analysis
GraphPad Prism 8.0 and R 4.2.2 were used for statistical analysis. We found significant differences between groups using an unpaired t-test with a p-value below 0.05. Hub ATG diagnostic efficacy was assessed using ROC curve analysis.