A comprehensive physical interaction map between the 1 Turnip mosaic potyvirus and Arabidopsis thaliana 2 proteomes 3

Viruses are obligate intracellular parasites that have co-evolved with their hosts to 2 establish an intricate network of protein-protein interactions. Yet, the systems-level mode 3 of action of plant viruses remains poorly understood. Here, we followed a high- 4 throughput yeast two-hybrid screening to identify 378 novel virus-host protein-protein 5 interactions between Turnip mosaic virus (TuMV), a representative plant RNA virus, and 6 Arabidopsis thaliana , one of its natural hosts. We found the RNA-dependent RNA 7 polymerase NIb as the virus protein with the largest number of contacts. We verified a 8 subset of 25 selected interactions in planta by bimolecular fluorescence complementation 9 assays. We then constructed a comprehensive network comprising 399 TuMV- A. 10 thaliana interactions to perform, together with intravirus and intrahost connections, 11 detailed computational analyses. In particular, we found that the host proteins targeted 12 by the virus participate in a higher number of infection-related functions, are more 13 connected and have an increased capacity to spread information throughout the cell 14 proteome, display higher expression levels, and have been subject to stronger purifying 15 selection than expected by chance. Overall, our results provide a comprehensive 16 mechanistic description of a plant virus-host interplay, with potential impact on disease 17 etiology, and reveal that plant and animal viruses share fundamental features in their 18 mode of action. 19

to defense. Certainly, this can be a consequence of targeting A. thaliana proteins that 1 participate in multiple functions, such as PLASMA-MEMBRANE ASSOCIATED 2 CATION-BINDING PROTEIN 1 (PCaP1), targeted by P3N-PIPO [31,32], which 3 reflects the highly intricate (mostly hormone-mediated) nature of the signaling and 4 regulatory pathways in plants [46]. Of course, NIb, with the largest number of contacts, 5 is the virus protein that impacts more host functions. The heatmap also revealed functions 6 targeted by all (or almost all) virus proteins, such as "response to stress", as well as 7 functions with a more specific relationship, such as "translation regulation" (only targeted 8 by VPg and NIb). 9 In addition, a principal component analysis with this data matrix organized the nine 10 virus proteins in a two-dimensional space according to their predicted impact on host 11 physiology ( Fig. 3c; with 74.2% of explained variance). Interestingly, we observed that 12 virus proteins that physically interact also locate closer in this space, which may be 13 indicative of a strategy evolved by the virus to coordinate the action of its proteins during 14 infection [47]. This is the case, for instance, for HC-Pro and NIa-Pro, VPg and NIb, or 15 CI and P3N-PIPO. This analysis also highlights P3 as the less related protein in terms of 16 host targets as compared to the rest of the virus proteins. Finally, we compared the 17 functions of the different virus proteins [26] with the functions of their host targets (Fig.  18 3d). In essence, the virus requires the exploitation and disruption of multiple biological 19 processes in the host to complete its cycle, involving for that a set of multifunctional 20 proteins. screening beyond its function, we performed a computational network analysis. Firstly, 25 we adopted a virus-centric approach to analyze how the virus proteome is rewired as a 1 consequence of establishing interactions with the host (Fig. 4a). We found a significant 2 enrichment in host factors targeted at least by two different virus proteins (z test, P < 3 0.0001; Fig. 4b), which suggests that such host factors can work as emergent 4 communication channels between the virus proteins [10]. While the length of the shortest 5 paths that connect any two virus proteins is invariant whether or not the host proteome is 6 considered (Fig. 4c), the number of paths increases appreciably when it is considered 7 (Fig. 4d). Of special note is the emergent relationship that is established between the 8 virus replicase NIb and the virus helicase CI, which may reflect the necessity for 9 coordination in the core replication complex [26,47]. 10 Secondly, we adopted a host-centric approach to contextualize the virus targets into 11 a model of the A. thaliana protein-protein interactome (Fig. 5a). We focused our analysis 12 on two relevant topological properties: connectivity degree (i.e., the number of direct 13 interactions that a given protein establishes with others) and average shortest path length 14 (i.e., the average length of all shortest paths that connect a given protein with the rest). 15 We found that the connectivity degree distribution for all host proteins has a scale 16 coefficient greater than for the virus targets (as illustrated by the different slopes of the 17 power laws that fit the data in Fig. 5b), which is indicative of an enhanced probability of 18 the virus to reach a hub protein. In particular, the scale coefficient dropped from 1.247 19 to 0.860, a reduction of 31.0% (comparison of slopes in an ANCOVA test, F1,21 = 9.70, 20 P = 0.0053). A similar trend has been quantified in the case of several human viruses 21 [48]. In addition, by comparing the distributions of connectivity degrees and the average 22 shortest path lengths for the virus targets and for different sets of random genes, we found 23 that the distributions associated with the virus targets are significantly shifted towards 24 higher connectivity degrees (Mann-Whitney U test, P = 0.0045; Fig. 5c) and lower path 25 P3N-PIPO (7 vs. 3, out of 13 host interactors), CI (28 vs. 6, out of 54 host interactors), or 1 NIb (101 vs. 42, out of 251 host interactors). To further explore this issue, we compared 2 the distributions of expression for the virus targets and for different sets of randomly 3 selected genes, revealing that the distribution associated with the virus targets is 4 significantly shifted towards higher expression levels (Mann-Whitney U test, P < 0.0001; 5 Fig. 6b). We repeated the comparison from expression data upon TuMV infection, 6 finding a similar trend (Mann-Whitney U test, P < 0.0001; Fig. 6c). We also noticed that, 7 on average, the expression of the virus targets marginally increases upon infection, as 8 opposed to what occurs in the null case of randomly sampled genes. 9 In addition, we found a relevant trade-off between the absolute level of differential 10 expression upon infection and the connectivity degree (Fig. 6d). That is, the higher the 11 connectivity degree of a given host protein, the lower the absolute differential expression 12 upon infection, indicating that hub proteins in the A. thaliana interactome display certain 13 robustness in expression levels in the presence of perturbations (as also observed when 14 certain animal diseases are assessed in terms of networks) [ while the genomes of species from a sister clade, Capsella rubella and Boechera 7 retrofracta, served as references for the analyses (Fig. 7a, evolutionary results for each 8 gene in Dataset S3). We found that the average pN/pS ratio is significantly smaller in the 9 set of TuMV interactors than for randomly selected proteins (Mann-Whitney's U test, P 10 = 0.0016; Fig. 7b); thus, indicating that these interactors are under strong purifying 11 selection and particularly well conserved. Moreover, we estimated the proportion of 12 adaptive nonsynonymous mutations by means of the direction of selection (DoS) 13 unbiased statistic in TuMV-interacting proteins and non-interacting ones in the A. 14 thaliana lineage [54]. Overall, no significant difference exists between both groups (Fig.  15   7c). However, it is interesting that the NIb interactor SCE1, a highly connected protein 16 in the host, ranks second among the TuMV-interacting proteins with the largest DoS > 0 17 values (Fig. 7d). Besides SCE1, none of the other top 5 adapted TuMV-interacting 18 proteins listed in Fig. 7d have been previously related to responses to infection. Among 19 the most conserved TuMV-interacting proteins (DoS < 0), none have been previously 20 annotated as related to infection but, instead, with different aspects of plant metabolism 21 (Fig. 7d) Secondly, we evaluated the rates of evolution (dN and dS) for each protein in the 3 branch leading to the Arabidopsis genus to compute w = dN/dS (Fig. 7e). We found 4 significantly smaller w values for the TuMV-interacting proteins than for randomly 5 selected proteins (Mann-Whitney's U test, P = 0.0043; Fig. 7f). This allowed us to 6 conclude, in agreement with the results described above for pN/pS, that TuMV-interacting 7 proteins have been subject to purifying selection and are not significantly enriched in fast-8 evolving genes. In agreement with these observations, several studies have shown that 9 fast-evolving antiviral proteins may not be representative of the many other proteins that proteins rarely act in isolation within the cell; different protein-protein interactions define 22 major functional pathways crucial for a variety of cellular processes. In this regard, we 23 expect that our data and analyses can guide further studies with plant viruses. In 24 particular, together with known interactions among the different virus proteins and host 25 proteins (also obtained by Y2H screenings) [40,41], we were able to represent, for the 1 first time, a comprehensive network that characterizes the interaction between a plant 2 virus (TuMV) and one of its natural hosts (A. thaliana). In addition, this interaction 3 network may serve as a useful map to guide future biotechnological developments aimed 4 at obtaining plants with increased tolerance to viral infections. 5 In this direction, it is worth noting that the A. thaliana protein with the largest number [60]. In the case of potyviruses, the small protein 6K2 is responsible for anchoring the 20 replication complex to the endoplasmic reticulum to seed the formation of viral factories 21 [26], so the evaluation of the effect of these host proteins on viral infection would be 22

interesting. 23
The virus-host protein-protein interaction network reported here is limited in various 24 ways. Firstly, TuMV proteins P1 and CP are not included in the network. We were 25 unable to detect interactors of P1 and CP in yeast. P1 is an unstable protein only 1 expressed at the beginning of infection. In a previous report, protein complexes that 2 associate with P1 in planta were retrieved by following an alternative strategy of affinity 3 purification coupled to mass spectrometry [61]. By contrast, CP is the coat protein of the 4 virus and may display a self-binding ability in yeast that precludes the interaction with 5 other proteins [62]. A new screening with an increased expression level of CP might 6 help. Secondly, some binary interactions already reported between TuMV and A. 7 thaliana escaped our HT-Y2H screening [26]. Other interactions that were previously 8 captured by affinity purification and that we did not obtain, such as between HC-Pro and 9 the master RNA silencing factor ARGONAUTE 1 (AGO1) [63], might be rationalized as Interestingly, NIb also interacts with a TGA transcription factor (TGA1, a hub in the host 24 proteome), which is presumed to enhance the negative effect on SA-mediated resistance 25 of the aforementioned interaction, especially because this targeting will produce a 1 feedback response in the system by down-regulating the production of SA [69]. 2

Another essential component of the plant immune response affected by viral 3
infection is the production of reactive oxygen species. In particular, H2O2 is produced in 4 the apoplast by plasma membrane-associated NADPH oxidases. Then, H2O2 is 5 for their infection and replication [14], and (iv) close to other proteins involved in disease 18 symptoms [23]. Our results demonstrate that these principles generally hold true in the 19 case of a plant virus, while adding some nuances. 20 Firstly, we show that the set of TuMV targets is significantly enriched in hub 21 proteins, which suggests that the perturbation introduced by the virus is deliberate and 22 not merely random. This pattern has been described for several human viruses, including 23 both RNA viruses (e.g., Hepatitis C virus or Dengue virus) and DNA viruses (e.g., 24 Epstein-Barr virus) [48]. This way, the virus has the potential of dismantling the entire 25 system through a selective attack [21], which aligns with the so-called centrality-lethality 1 rule [19]. Moreover, by leaning on hub proteins of the host, a small set of viral proteins 2 can lead to the simultaneous rewiring of many cellular processes. However, hubs only 3 represent a small fraction of all nodes in a scale-free network, so this proportion is also 4 transmitted to the virus targets. Our results also reveal that TuMV selectively targets 5 proteins that, while not being highly connected, bridge different parts of the A. thaliana 6 interactome, thereby with increased ability to spread information (i.e., proteins that have 7 high neighborhood connectivity) [22,34]. In addition, the virus protein by itself may work 8 as a bridge to coordinate the perturbation among different host factors, as may be the case 9 of HC-Pro, P3N-PIPO, or NIb, by targeting proteins with very different functions in the 10

cell. 11
Secondly, by analyzing both the pN/pS and w ratios, we found that the TuMV targets 12 are significantly enriched in proteins whose genes have evolved slowly (i.e., conserved 13 genes). This is in tune with previous analyses of human genes targeted by viruses 14 [12,25,55] and suggests that viruses look for constrained elements with the aim of 15 ensuring broad host ranges. We then evaluated if even at slow rates these virus targets 16 have been subject to positive selection, finding non-significant results. This contrasts 17 with the concept of an evolutionary arms race, as well as with previous analyses 18 indicating that virus targets in humans have been adapted in response to infection [25]. 19 Perhaps, a new evolutionary study restricted to subsequences encoding viral protein 20 binding surfaces rather than whole gene sequences would offer more enlightening results 21 [24]. We also found that the TuMV targets display higher expression levels, which indeed 22 favors the interaction with multiple proteins (e.g., viral proteins) in multiple tissues. 23 Interestingly, it is well accepted that highly expressed genes tend to evolve at slower rates 24 [78], arguably because the exploration of new variants is more costly [79]. Thus, 25 conservation and expression are two sides of the same coin that the virus conveniently 1

exploits. 2
Thirdly, our functional analysis showed that the host proteins targeted by the virus 3 are significantly enriched in GO terms associated with defense and regulation, but also 4 with general metabolic processes. Although it is difficult to establish a frontier, the study 5 of the mode of action of multiple human viruses revealed that proteostasis, signaling (e.g., 6 JAK-STAT pathway, which acts downstream of IFN-a/b), transport, and RNA 7 metabolism are host functions preferentially targeted by RNA viruses, while 8 transcription, proteostasis, macromolecular assembly, DNA and RNA metabolisms, and 9 cell cycle (e.g., cancer pathways) are the focus of DNA viruses [14]. TuMV, a plant RNA 10 virus, seems to be closer to human RNA viruses, especially because it impairs hormone 11 signaling pathways, host protein expression at the level of translation, and the RNA 12 silencing machinery. 13 Fourthly, it has been argued that proteins involved in disease susceptibility and amplified by PCR from those cDNAs with the Phusion High-Fidelity DNA polymerase 10 (Thermo) by using suitable primers, including Gateway adapters, also listed in Dataset 11 S4. The constructs were then recombined into the plasmid pDONR207 by using the LR 12

Clonase II Enzyme mix (Invitrogen). All constructed plasmids were amplified in 13
Escherichia coli strain DH5a, purified, and verified by sequencing. 14 15 HT-Y2H screening. To identify the host proteins that interact with the eleven TuMV 16 proteins, an A. thaliana Col-0 cDNA library (Clontech) was screened by using the 17 Matchmaker Gold Yeast Two-Hybrid System (Clontech). For this, the Y187 haploid 18 yeast strain (with library proteins) and the Y2HGold haploid reporter strain (with the 19 plasmid pGBKT7 that expresses each of the 11 TuMV proteins) were mated and plated 20 on a double dropout medium (SD/-Leu/-Trp) containing 40 µg/mL X-a-Gal and 200 21 ng/mL aureobasidin A, and then incubated at 30 °C for 5 days. Co-transformants that 22 were phenotypically positive for a-galactosidase activity were subjected to a further, 23 more stringent phenotypic assay on a quadruple dropout medium (SD/-Leu/-Trp/-Ade/-24 His) containing X-a-Gal and aureobasidin A. Plasmids pGBKT7-T-antigen, pGADT7-25 laminin C, and pGADT7-murine p53 (Clontech) were used as negative controls. Colony 1 PCRs were performed from the yeast colonies that displayed a positive interaction to 2 eliminate duplicate clones, and prey plasmids were rescued with an isolation kit to be 3 subsequently transformed into E. coli DH5a for amplification and sequencing, as  Cultures of Agrobacterium tumefaciens strain C58 harboring appropriate binary 7 plasmids were grown overnight and then centrifuged, and OD600 was adjusted to 0.5 with Image processing was performed with ImageJ v1.8 [84].  Table  25 S1). Only long splicing variants were kept to minimize the complexity of the dataset. 1 Furthermore, proteins shorter than 50 amino acids were filtered out. Orthologous gene 2 groups were detected with OrthoFinder [92], which was run with default setting on the 3 nine genomes. The orthologous groups were pruned for duplicates by only keeping the 4 best BLAST hit against the A. thaliana gene copy. Further increment of the dataset was 5 done by allowing a certain amount of absence in some of the genomes, with the absence 6 being allowed if the closest relative was present (Fig. S2). Amino acid alignments were   Phylogeny of the three species used for the population study. b) Comparison between the 20 actual pN/pS distribution [computed as pN/(pS + 1) to account for genes with pS = 0] and a 21 representative null distribution (from randomly picked genes). * Statistical significance 22 (Mann-Whitney U test, P < 0.05). The inset shows the distribution of P values after 10 3 23 random realizations, with geometric mean 0.0016. c) Comparison between the actual 24 direction of selection (DoS) distribution and a representative null distribution (from 25