To analyze the interactions between SARS-CoV-2 and cell proteins data from the proteinAtlas were collected as described in the Methods section. A total interactome of 499,620 interactions (edges) and 20,091 proteins (nodes) was generated and then reduced to interactions reported two or more times, resulting in 77,351 PPIs and 13,035 proteins.
Input tropism
Of the 62 tissues reported in the RNA consensus tissue gene data, 57 had at least three expressed genes of the seven inputs (a combination of protein S-binding proteins (ACE2, NRP1 or BSG) and proteases (TMPRSS2, CTSL, CTSB or FURIN)). Of the 57 tissues (145 cell types) reported in the normal tissue data, 45 had at least 3 of the seven proteins we looked for. Both data sets shared 39 tissues, containing 69 different cell types (see Figs. 1 and 2b). Six cell types such as bronchus, nasopharynx or oral mucosa did not have a transcriptome report at the tissue level; 23 tissues had no information for cell types, such as the olfactory region. With SARS-CoV-1, the results were similar, but considering four proteins (ACE2, BSG, TMPRSS2 and CTSL).
Of the 39 tissues identified in the analysis for SARS-CoV-2, 10 (16 cell types) did not report the presence of ACE2 and TMPRSS2 protein, nor mRNA: adrenal gland (glandular cells), bone marrow (hematopoietic cells), cerebellum (granular layer cells, molecular layer cells and Purkinje cells), cerebral cortex (endothelial cells, glial cells, neuronal cells and neuropil), endometrium (glandular cells 1 and 2), germinal and non-germinal center cells of lymph nodes, parathyroid gland (glandular cells), skeletal muscle (myocytes), smooth muscle (smooth muscle cells) and spleen (red pulp cells), see Fig. 2a. Figure. 2a shows that bone marrow did not express ACE2 or TMPRSS2, and several tissues had below threshold values (NX < 1) of ACE2, such as lung (0.8). At this point, no cell type was removed. In contrast, for SARS-CoV-1, we removed 14 cell types with neither ACE2 nor TMPRSS2 (nor in mRNA); BSG and CTSL only had expression data from mRNA (see Fig. 2c). These results show that ACE2 and TMPRSS2 are the least present in the 69 cell types of the seven input proteins.
PPI tropism
To investigate differences in the PPI networks that could occur in infected cells, we compared the size of the interactomes inferred in each cell type. The cells showing the highest number of interactions with SARS-CoV-2 proteins were in the kidney (tubule cells and glomerulus cells), small intestine (glandular cells) and lung (pneumocytes), with a maximum of 970 proteins out of the 1345 reported. The least connected cells are bone marrow (hematopoietic cells), colon (peripheral nerve/ganglion), placenta (decidual cells) and ovary (follicular cells), see Supplementary Table S1. In SARS-CoV-1, the results for the cells with the least interaction are the same as in SARS-CoV-2, and for the cells with the most interaction are lung (pneumocytes) and glandular cells (thyroid gland, duodenum, salivary gland, small intestine, gall bladder, colon, and pancreas), with 114 proteins (out of 151) interacting with the 26 viral proteins (see Supplementary Table S2).
Initially it was thought to be an infection associated with an expression profile. When clustering the cell types by expression value, a group of cells was identified with a higher presence of several proteins, see Supplementary Fig. S1. Comparing the unsupervised hierarchical clustering, in Supplementary Fig. S2, mainly these cell types were glandular. A similar pattern was observed for SARS-CoV-1 (Supplementary Fig. S1c). In addition to glandular cells, this clustering shows group cells according to tissue type (skin, brain).
To characterize the dynamics of each cell according to their IPPs, their networks were represented. Obtaining the Average Degree (ranging from 4.561 to 7.391) to identify the connectivity of each network, we determined that the most connected cellular networks are pneumocytes, glandular cells (endometrium 1), squamous epithelial cells (esophagus), non-germinal center cells (lymph node and tonsil tissues) and cells in red pulp (spleen), see Supplementary Table S1. For example, to visualize emblematic cells, in Fig. 3 the lung (pneumocytes and macrophages) as the most affected tissue, and small intestine (glandular cells) as the tissue with the highest expression of ACE2, networks are visualized at the V-H PPI level. This considering the expression in each cell and the total connectivity of each protein. It is observed that the most linked proteins of SARS-CoV-2 are M, nsp7b and orf3a, while the most connected to human are LARP7 and GOLGA2. If we focus on V-H, direct PPIs (without H-H edges), the most connected are ATP1A1, ATP2A2, HACD3, XPO1 and ATP5F1B. As identified in Supplementary Fig. S1 and Fig. S2, glandular cells (small intestine) show increased expression of several proteins. In Fig. 3G, 3H and 3I, the networks of the same tissues in SARS-CoV1 are shown, showing UBC as the protein with the most connections, identifying a similar expression pattern as with SARS-CoV-2.
For each proteome of each cell type, a gene set enrichment analysis (GSEA) was generated. Unsupervised clustering (binary distance and the simple linkage method) allowed the identification of a "gradient" of cells, where molecular processes are altered from lower to higher degree. Figure 4 shows the KEGG signaling pathways in SARS-CoV-2 and SARS-CoV-1. In both heatmaps, an effect of proteome size was identified (Supplementary Table S1 y S2), thus 14 cell types were removed, reducing to 55 the initial 69 cell types. For SARS-CoV-1, lung (macrophages) and tonsil (squamous epithelial cells) were also identified as not significantly altering the molecular environment and were therefore also removed for subsequent analyses. Hence, for SARS-CoV-2, 55 cell types remained and for SARS-CoV-1, 45. Supplementary Figure S3 shows the behavior of the remaining SARS-CoV-2 cells in terms of molecular functions and biological processes. With these results, we identified that salivary gland (glandular cells), esophagus (squamous epithelial cells), testis (Leydig cells), lung (pneumocytes) and small intestine (glandular cells) behave as a group (identifiable in molecular functions, dark blue branch, where they share a group of unaltered functions). Supplementary Figure S4 shows the effect of this reduction on SARS-CoV-1.
Supplementary Figure S5 shows the 50 most discriminating results, and the profiles of the "gradients" shown in the previous results can be identified. For SARS-CoV-2, one signaling pathway that appears relevant is "Ribosome biogenesis in eukaryotes" (Supplementary Fig. S5b). Given its relevance, subnetworks of ribosomes and their biogenesis were generated. The ribosome network shows a profile like that observed in Fig. 4 and Supplementary Fig. S1: high protein expression value in the small intestine (glandular cells); moderate-high in the lung (macrophages); moderate in the kidney (tubule cells) and moderate-low in the lung (pneumocytes). Looking at Ribosome biogenesis networks in eukaryotes, we identified that the behavior of pneumocytes is more like that of the small intestine (glandular cells), especially in the Expression value, as is the case of XPO1, which is highly connected by SARS-CoV2 proteins, see Fig. 5.
Like ribosome biogenesis in eukaryotes, the coagulation cascade is reported in several results, Fig. 4a. Both signaling pathways are shown in Fig. 6 to visualize the degree of involvement at the pneumocyte and systemic level. In ribosome biogenesis in eukaryotes, FGA, FGB, PROS1, PLAU, PLAT, PROCR, and CPB2 interact directly with virus proteins; with 20 proteins interacting indirectly. In the coagulation cascade, XPO1, CSNK2B, XRN2, DKC1, NOP56, NVL, RBM28, NAT10, TBL3, and MPHOSPH10 are proteins that interact directly with virus proteins.
Obtaining the centroids of 55 SARS-CoV-2 cell types (45 for SARS-CoV-1), 8 clusters were identified, separated into 4 quadrants. According to the distribution of Expression value and PPI, the importance of connectivity associated with the protein level was identified. Quadrants 1 and 4 (high PPI) of Fig. 7 are the most important, where most of the cells of the tissues most reported in the literature are located, such as pneumocytes, kidney (cells in tubules and cells in glomeruli), testis (cells in seminiferous ducts), cardiac muscle (myocytes), and liver, among others.