An Interactome Map of Maize ( Zea mays L . ) 1 2


 Interactomes are powerful tools for encoding and decoding complex life systems. Here, we generated a maize interactome map that integrates genomic interactions, transcriptomic co-expression networks, translatomic co-expression networks, and protein–protein interactions throughout the maize lifecycle. This map, containing over 9 million interactions in more than 5,000 functional modules, reveals extensive functional divergence for duplicate genes and a progressive increase in regulatory divergence between the two maize subgenomes during the flow of genetic information. This network enables dissecting and validating gene functions, re-constructing regulatory pathways, and deciphering molecular mechanisms underlying complex traits combining big data mining technique-machine learning. By applying this map to flowering-time, we identified 1,843 high-confidence genes enriched in eight molecular pathways that are related to flowering time. The function of 30 (out of 58 tested) genes, including 27 novel genes, was verified by loss-of-function mutagenesis. Furthermore, a new pathway involving histone modification was identified and confirmed to regulate flowering time. The interactome map illustrates how coherent sets of molecular interactions connect different types of functional elements and pathway modules to map a genome-wide functional wiring landscape of maize, which will be applicable in a wide range of species.

To construct a genome-wide protein-protein interactome (PPI), we used eight bait 91 and eight prey cDNA libraries from eight distinct tissues across the whole developmental 92 stages and to screen for interactions via 130 library vs. library matings using the RLL-Y2H  Table 4).

99
The multi-omic interactomes exhibited markedly different topological hierarchy 100 ( Fig. 1c-1f). Based on different confidence cut-offs (See Methods), the interactomes were 101 grouped into low-, middle-, and high-confidence versions that showed similar trends  Table 2), exhibiting a distinct topological structure for each 107 regulatory layer (Fig. 1c-1f). 108 We integrated the interaction network from different layers. In all, we generated 109 over 2 trillion bases and detected over 182,995 functional elements and ultimately 110 constructed an interactome with over 3 million edges ( Fig. 1h; Supplementary Fig. 5).

111
Diverse types of regulatory elements interacted in this network, for example, teosinte 112 glume architecure1 (tga1) 11 showed interactions with lncRNAs, circRNAs, and miRNAs which were not found previously (Fig. 1i). We've developed a user-friendly website to store 114 all the interactome information (http://cbi.hzau.edu.cn/interactome/index.php), which can 115 be searched easily by gene name or using node information. 116 A comparison of our interactome with previously generated interactomes revealed 117 significant conservation of interactions ( Supplementary Fig. 6a-6j), and over 50% of PPIs 118 could be validated using other biological techniques (Supplementary Table 5;  there is no evidence for regulatory bias between the two subgenomes 12,13 . We detected no 144 or only subtle differences between the subgenomes at the transcriptome level. However, 145 significant differences were detected at both the translatome and proteome levels (

151
To further explore the evolutionary relevance of subgenome divergence, we 152 focused on hub genes at different omic layers. Intriguingly, Maize1 hub genes of both the 153 transcriptome and translatome levels were significantly over-represented (P < 0.01) at 154 genomic regions targeted for domestication and improvement 14 ; however, Maize1 hub 155 genes of the proteome level were not over-represented (Fig. 2c). Accordingly, Maize2 hub

168
Maize1 and Maize2 represent homologous genes from subgenome Maize1 and Maize2, respectively. Maize1 without 169 Maize2 indicates Maize1 genes whose corresponding Maize2 genes were lost. Maize2 without Maize1 indicates Maize2 170 genes whose corresponding Maize1 genes were lost. b, Biased regulatory fractionation at the transcriptome (top), 171 translatome (middle), and proteome (bottom) levels is observed for the reconstructed or "sorghumized" pair maize 172 ancestral chromosome-chromosome 1. Dominant bin is defined as a genome region in which more genes in one 173 subgenome have a significantly higher degree of bias than the corresponding homologous genes in the other subgenome.

174
Lines are LOESS regression lines. The pie chart to the right represents the number of dominant bins from each subgnome. Chi-square test, respectively. Grey, yellow, and blue bars represent genome-wide, Maize1, and Maize2, respectively.

178
The interactome recapitulates the networks of important genes 180 The interactome map can reconstruct the networks of well-known genes and 181 uncover new crosstalk genes with similar function. We explored an interaction subnetwork 182 involving three key maize tillering genes, Teosinte Branched 1 (TB1) 16 , Grassy Tillers1 183 (GT1) 17 , and Tassels Replace Upper Ears1 (TRU1) 18 . Loss-of-function mutations in any of 184 these three genes lead to more tillers, a characteristic of the maize ancestor teosinte. TB1, 185 GT1, and TRU1 demonstrated an interaction with ZmALOG1 (Zm00001d003057) and 186 ZmALOG2 (Zm00001d032696), two functionally unknown genes belong to the ALOG 187 (Arabidopsis LSH1 and Oryza G1) transcription factor family (Fig. 3a). Intriguingly, the 188 loss-of-function mutations in ZmALOG1 and 2 caused enhanced tillering, similar as that in 189 tb1, gt1, and tru1 mutants ( Fig. 3b; Supplementary Fig. 12a). The interactions between 190 ZmALOG1 and 2 and TRU1 were confirmed by Y2H assays (Fig. 3c). Moreover,191 ZmALOG1 and 2 also interact with TB1 (Fig. 3c). These results demonstrate that the 192 interactions between known and unknown genes in subnetworks have biological meaning.

193
Therefore, our maize interactome can be used to reliably predict the functions of genes of 194 interest, as well as their putative interaction pathways, shedding light on the regulatory 195 mechanisms of known and unknown genes.   Table 6). We 216 identified 55 out of 63 genes in the interactome and divided them into two groups (40 for 217 constructing kernel-related subnetworks and 15 for validating the robustness of the kernel-218 related regulatory network). The subnetworks of 40 randomly selected kernel genes (1000 219 simulations) successfully predicted up to 40% of the validated kernel genes, a value 220 significantly higher (P = 1E-6) than that of randomly selected functionally unrelated genes 221 ( Supplementary Fig. 13a). Notably, all 55 kernel genes could be assembled into a linked 222 subnetwork in the maize interactome (Fig. 3f), which was further clustered into eight values ranging from 0.68 to 0.89 (Fig. 4a). Notably, the integrative interactome has the 257 highest AUC value (up to 0.9). Totally, 3,553 genes were predicted to be associated with 258 FT at different confidence levels (Fig 4b; Supplementary Table 10), suggesting that an 259 ultracomplex molecular mechanism underlies FT in maize. We predicted the shortest 260 distance (SD) to known validated FT genes between all predicted FT genes and randomly 261 selected genes, and found that the predicted FT genes had significantly lower SD values 262 with validated FT genes compared to randomly selected genes (Fig. 4b). Using more 263 restrictive filtering via empirical cut-off of SD distribution, we obtained 1,843 high-  Table 13). The unexpected discovery of a core 314 gene of this newly identified FT pathway in maize lays the foundation for studying the 315 relationship between FT and vernalization in domestication in the future 13,32 .

316
Since we systematically uncovered and validated a series of molecular pathways 317 underlying FT in maize, we asked whether genes from these pathways contribute in     Genetic information is derived from the genome and transferred to the 364 transcriptome, translatome, and finally the proteome. This process is regulated at multiple 365 levels simultaneously 1 . Dissecting these regulatory networks at multi-omics levels is a 366 highly efficient way to explore the functional genomics and life systems of an organism, 367 which are highly dynamic and complex 1 . We assembled a comprehensive interactome, 368 which contains genomic, transcriptomic, translatomic, and proteomic interactions. Our 369 data demonstrate that functional divergence exists at different layers and leads to gradually   Transport and Maize Branching. Plant Physiol. 147, 1913-1923(2008.  was reverse-transcribed to cDNA, which was used as a template to prepare the mRNA-seq,

Identification and quantification of fusion RNAs 549
Mapsplice-v2.1.8 56 was used to identify fusion RNAs from total RNA-seq data 550 from each tissue. The following filtering steps were performed to obtain high-confidence 551 fusion RNAs. Firstly, PolyA sequences were removed using the following the criteria: the 552 proportion of As and Ts was > 40%, and the proportion of As and Ts was > 80%. Secondly, 553 to remove the fusion RNAs formed by homologous genes, the two parental genes 554 (overlapping sequences > 50 bp) that formed the fusion RNA were identified, and the 555 similarity between the parental genes was monitored. If the corresponding parental genes 556 could not be found (overlapping sequences < 50 bp), the sequences were extended 500 bp

605
After removing rRNAs, the remaining RNAs were used to construct libraries, which were 606 sequenced on the Illumina HiSeq 2500 platform.

607
After removing adaptors and quality control using FASTX_Toolkit-0.0.14 608 (http://hannonlab.cshl.edu/fastx_toolkit/index.html) fastx_clipper with the parameters "-l 609 5 -c -n -v -Q33" and fastx_trimmer with the parameters "-f 1 -Q 33", the Ribo-Seq reads 610 were aligned to the rRNA reference sequence from NCBI using Bowtie-1.  at 30°C. Each library mating was repeated at least 10 times. A barcode sequence was added 659 to each repeat, and ten repeats were pooled to construct a sequencing library.

660
Yeast plasmid isolation and PPI sequencing 661 Following yeast mating, the clones on plates for each repeat were collected and the 662 fusion plasmids extracted from the yeast cells using a yeast plasmid kit (OMEGA).

663
Barcoded (Third-generation sequencing barcodes) vector primers (Supplementary Table   664 14) were used to amplify the plasmids for each repeat. This PCR products were further 665 purified using DNA-free beads to remove fragments shorter than 0.75 kb. sequence were aligned consistently, the proteins encoded by the two genes were considered 692 to be interacting proteins.

693
The PPIs were divided into three confidence groups based on the probability of 694 self-activation. Low-confidence PPIs indicates interaction edges detected using offline data.

695
Because the mating of yeasts with empty vectors generated interaction signals (self-

725
To obtain a network containing high-confidence small RNAs with potential 726 functional implications, co-expression networks were constructed using 25,300 genes, 727 5,469 lncRNAs, 14 fusion RNAs, 4,318 circRNAs, and 159 micro RNAs. High, middle, 728 and low-confidence edges were identified as described above. In addition, a co-expression 729 network only including 24,394 annotated genes was constructed named the Slim-730 transcriptome network; high, middle, and low-confidence edges were identified in this 731 network as described above.

733
Four omic data sets, including ChIA-PET data from Peng et al. 9 , a co-expression 734 network constructed using RNA-Seq data from 26 tissues or stages, a co-expression 735 network constructed using Ribo-seq data from 20 tissues or stages, and PPIs from yeast-2-736 hybrid screening were used to construct an integrative omic database including all possible 737 temporal and spatial interactions. An integrative EW (EWi) was calculated as detailed 738 below. If an edge was present in one omic data set, the EW was raised to the fourth power 739 to obtain the EWi. If an edge was present in two omic data sets, EWi was calculated by 740 multiplying the square of the average EW by each EW. If three edges were present, EWi 741 was calculated by multiplying the average EW by each EW. If four edges were present,   Supplementary Fig. 8g,8h). Additionally, the dominant subgenome bins were obtained 892 through Chi-square test on the degree to which the proportion of genes in the subgenome 893 (out of 100) in a sliding window is significantly higher than that of the corresponding 894 subgenome genes displayed based on the gene order along sorghum chromosomes 895 ( Supplementary Fig. 9).  Fig. 11a). Analysis of the nucleotide diversity of wild maize relatives, landraces, and 904 modern inbreds indicated that ZmNAC75 has been a selection target during modern maize 905 improvement ( Supplementary Fig. 11b).  Shared edges between transcriptome, translatome, and proteome in our study and the 935 randomly generated interaction datasets at all different confidence levels. The sharing 936 rates of edges in our study (c, d, e) is significantly higher than that in the randomly 937 generated interaction datasets (f, g, h).