Phylogenomic Relationships in Insects
Phylogenetic relationships of insects are shown in Figure 1. The phylogenetic analysis produced consistently a strong statistical support (all nodes have 100 bootstrap value) for the evolutionary relationships among insect orders.
The tree was separated into two clades; the first consists of hemimetabolous insects belonging to Hemiptera, Homoptera and Isoptera, and the second clade includes holometabolous insects belonging to the Diptera, Lepidoptera, and Coleoptera. The hemimetabolous insects formed a monophyletic group. The Hemiptera and Homoptera orders were derived from a common ancestor. The Hymenoptera insects were grouped as a monophyletic group within the holometabolous insect clade, and the Lepidoptera and Diptera orders were observed to have derived from a common ancestor and have a sister clade relationship with Coleopteran insects.
Protein Family Diversity in Insects
The number of genes in protein families greatly varied among insects (Figure 2 and Table S1). Protein families involved in various metabolic/cellular processes were found overrepresented in almost all insect species. Those include cellular process (Pkinase), digestion (Trypsin), binding activity (zf-C2H2, zf-H2C2_2, RRM_1, WD40, LRR_8, Ank_2, BTB) developmental process (Chitin_bind_4 and Homeobox), cellular signaling (7tm_1, 7tm_6 and 7tm_7), transport membrane transporter activity (MFS_1 and Sugar_tr), oxidoreductase activity (p450), GTPase activity (Ras) and immunoglobulin (Ig_3) (Figure 2 and Table S1). Gene number comparisons in each protein family revealed that order specific protein family expansion was observed in various protein families. Trypsin was found to expand in Diptera, Lig_chan was found to be expanded in Isoptera and zf-H2C2_2, was expanded in Isoptera. MADF_DNA_bdg, THAP, Kelch_1, and PIF1 protein families were expanded in Acyrthosiphon pisum (Homoptera). Additionally, some protein families were found to have species-specific expansion. For example, the BTB family was expanded in Cryptotermes secundus, Drosophila melanogaster, and Formica exsecta (Hymenoptera) species. 7tm_6, a sensory protein family, was found to expand in species belonging to two different orders, F. exsecta and Tribolium castaneum (Coleoptera). Another example of species-specific protein family expansion was observed in Ig_3, Immunoglobulin domain protein, in D. melanogaster. 7tm_7 was expanded in two coleopteran species, Anoplophora glapripennis and T. castaneum and Ion_trans in D. melanogaster (Figure 2, Table S1). Overall, these results suggest that insect species share common protein families with various copy number expansion/contraction in specific protein families. Although copy number variation was observed in a few protein families as order-specific, most diversity in copy number was observed as species-specific.
Protein Family Variation and Correlation in Insects
To understand variation and correlation in copy numbers in protein families PCA was performed. Highest variation was observed in PC1 (47.26%) and followed by PC2 (16.81%), PC3 (9.52%), PC4 (7.82%) and PC5 (4.79%) (Figure 3).
PCA-based insect species (variables) grouped insects into two dimensions with 82.4% and 4.1% variabilities in dimension one and dimension 2. The highest contribution to variability in protein families in insects was observed in Anopheles gambia (Diptera), Danaus plexippus (Lepidoptera), Nicrophorus vespilloides (Coleoptera), Agrilus planipennis (Coleoptera), Dendroctanus ponderosae (Coleoptera), Laodelphax striatellus (Hemiptera), and Nilaparvata lugens (Hemiptera) and other species contribution to variability was moderate (Figure 4). The lowest contribution to variability was observed in A. pisum (Figure 4). It was found that overall copy number variations in protein families were similar in insects and showed a high positive correlation among insects (Figure 5), except that C. secundus and A. pisum showed a lower positive correlation compared to other species. Pairwise protein family comparisons among insects showed a strong positive correlation among insects with exception of C. secundus and A. pisum that have a lower correlation with other insect species (Figure 5). C. secundus and A. pisum have a positive correlation with each other (Figure 4).
PCA for protein families (individuals) showed that most protein families were similar in insects and grouped together and positioned at the middle point of the axis. Protein families PBP_GOBP, CBM_14, Homeobox, 7tm_6, COesterase, Ras, Ig_3, LRR_8, p450, and Chitin_bind_4 was found to be positively correlated as a group, while Kelc_1, THAP, MADF_DNA_blog, BTB, Pkinase_Tyr, I-set, adh_shor, Ank_2, Sugar_tr, 7tm_1, RRM_1, WD40, and zf_C2H2 were found to have positive correlation together in a group (Figure 6). Regarding variability, in each principal component (PC), protein families zf-H2C2_2, Trypsin, Pkinase, 7tm_6, THAP, H_psq, and 7tm_7 was found to have the highest contribution to variability in almost all PCs (Figure 7).