The study included a total of 32 HIV-infected and HIV-uninfected participants. Seventy-eight percent of the participants were whites (n=25), and the majority of them were males (n=22, 69%) (21). The median age of the study participants was 49.5 years (IQR=33-66). In this study, we applied a maximum relevance minimum redundancy method to rank the importance of the 26 genes which were differentially expressed between the two groups. DFFA was the most relevant (positive score) and TNFRSF1A (redundant, least negative score), as shown in Figure 1A. DFFA is a proapoptotic gene in the executioner pathway, and TNFRSF1A is a proapoptotic gene in the extrinsic pathway. To assess the discriminatory power of DFFA and TNFRSF1A, we then tested two different classifier models (Elastic-Net and KNN) to classify study participants based on these two selected genes into groups. Once we fit the elastic-net model, we used it to predict the outcome (case or control). Figure 1B shows that using all the 26 genes, we could predict the outcome with 80% accuracy. However, even with a reduced set of genes (either 2 (DFFA and TNFRSF1A or 5 (DFFA, TNFRSF1A, BCL2, FASLG, TNF) or 10 (DFFA, TNFRSF1A, BCL2, FASLG, TNF, CASP14, CASP7, TRAF2, CYCS, LTBR) or 20 (Figure 1B)), we still achieved 60-80% accuracy. For the KNN algorithm predictive accuracy, Figure 1B also shows that using all the 26 genes, we could predict the outcome (case or control) with 80% accuracy. Like elastic-net testing, we tested a reduced number of features and achieved 90% accuracy with the KNN classifier using only two genes--the minimum redundant (TNFRSF1A) and maximal relevant (DFFA) genes. The KNN classifier model using the two-top ranked mRMR genes correctly classified 90% of the participants into their respective groups.
Estimating Networks
We estimated a GLASSO network plot of all the genes, with edges connected by partial correlation values as shown in Figure 2. Of the 26 genes, 18 were proapoptotic (TNFRSF1A, CYCS, DFFA, ABL1, LTBR, CASP7, FASLG, BAD, TRAF2, BAK1, CIDEA, TNFRSF11B, CASP14, BIK, GADD45A, CASP5, CD70, and TNFRSF9), 5 were antiapoptotic (BCL2, BRAF, BIRC5, IL-10, and NOL3), and 3 had dual functions (CD27, HRK, and TNF). We also estimated the networks using GLASSO separately for cases and controls (Figure 3). We obtained centrality measures and assessed the stability of networks for cases and controls.
Cases
The case network structure (Figure 3) showed the strongest positive edge-weights between DFFA and TNFRSF1A, CYCS and BCL2 and negative edge-weights between BCL2 and TNFRSF1A, BCL2 and FASLG, and FASLG and CIDEA. The accuracy of connections was evaluated by bootstrapped CIs analysis. The bootstrapped CIs revealed large CIs for the estimated edge-weights, suggesting that many of the edge-weights did not differ significantly from one another. However, CIs for the edges of CD27 and FASLG, and FASLG and TNFRSF1A did not overlap with bootstrapped CIs of other edges and were likely the strongest edges. As we decreased the sample size, stability was reduced. Centrality indices results revealed that FASLG had the highest strength, betweenness, and closeness (Figure 4, red) among all 26 genes analyzed, suggesting that FASLG had most interactions with other genes.
Controls
As shown, Figure 3 is the controls network. LTBR and TNF, DFFA and ABL1, HRK and CASP7, BCL2 and BAD, CYCS and BCL2 had strong edge-weights, with weak negative edge-weights found between TNF and ABL1, LTBR and CD70, CYCS and TRAF2 and CYCS and LTBR. The edge-weight accuracy results revealed that most of the edge-weights did not differ significantly from one another. With a decrease in sample size, strength was unstable. Centrality indices plot (Figure 4, teal) and centrality scores showed that ABL1 was the most central variable in the controls network.
Network comparison
To further analyze the overall differences between the two networks, a network comparison test was performed to examine the differences in the weights of connection. Ninety-two out of 325 connections differed significantly between networks. In addition, the highest strength centrality in controls, the ABL1 gene linked to TNFRAS1A, CASP14, TRAF2, CASP5, TNFRSF9 and DFFA were significantly different (p<0.005) in the two networks. The highest strength centrality in cases, the FASLG linked to TNFRS1A was the only significant edge (p<0.001) between the two networks. The paired t-test revealed that the global strength was significantly different between the two networks (p < 0.05), and the controls network had more significant edge-weights between nodes compared to the cases.
Community Detection between Cases and Controls
We explored a network model-based clustering using the Spinglass algorithm separately for cases and controls, as shown in Figure 5. The algorithm identified 3 clusters in cases and 5 clusters in controls. For cases, DFFA, TNFRSF1A, BCL2, CYCS, and ABL1 were in cluster one (blue in Fig 5), FASLG, BRAF, NOL3, IL10, CD70, TNFRSF9, CASP5, CD27, LTBR, TRAF2, CASP7, and TNF belonged to cluster two (green in Fig 5) and CASP14, HRK, GADD45A, BIRC5, BAK1, BIK, BAD, TNFRS11B, and CIDEA belonged to cluster three (light red in Fig 5). For controls, DFFA, FASLG, TRAF, and ABL1 belonged to cluster one (orange in Fig 5), CASP14, IL10, CD70, BIRC5, TNFRSF9, BIK, TNFRSF11B, CIDEA, CASP5, and CD27 were in cluster two (light red in Figure 5), TNF, BRAF, HRK, BAK1, LTBR, and CASP7 were in cluster three (green in Figure 5), TNFRSF1A, BAD, CYCS, and BCL2 were in cluster four (blue in Figure 5) and NOL3 and GADD45A were in cluster five (yellow in Figure 5).