Draft genome assembly
We obtained 720 million Chromium linked reads of a male C. esakii genome (Table S1) and performed draft genome assembly with 746,666 linked reads (56x coverage of the expected genome size of 200 Mb) using the Supernova2 assembler. The assembled genome was 193 Mb in length and 2.0 Mb in contig N50 (Table 1); it showed high completeness (98.2%) in terms of BUSCO score, and 23,767 protein-coding genes were predicted (Table 1). We used this draft genome as the reference genome in the following analyses.
Transcriptomic data analyses of all genes
We obtained transcriptomic data by RNA-seq for male and female pupae of C. arrowianus (Ar), C. insulicola (In), C. komiyai (Ko), and C. esakii (Es) from genital tissue of two pupal stages: the early pupal stage (PE) at 1–3 days after pupation and the late pupal stage (PL) at 4–6 days after pupation (Fig. 1; Table S2). We mapped all of the transcriptome sequence reads to the C. esakii draft genome and obtained 9,778 genes, which were mapped across the development stages with an average of 10 reads. Expression variation analyses of these genes showed mean percentages of expression variance that were explained by species, stage, and sex of 10.1% (max, 77.0%), 8.86% (max, 64.3%), and 3.03% (max, 71.9%), respectively (Fig. S1). Principal components analyses (PCA) of the expression profiles of these genes showed that individual samples were not clustered discretely by species, stage, and sex (Fig. S2). However, there were significant differences in principal component scores 1 (PC1) and 2 (PC2) among species, between stages, and between sexes (Table S3). Thus, the effects of species, stage, and sex on the variation in gene expression were limited.
DEGs among species and gene clusters of DEGs
To investigate differences in gene expression between species associated with length differences in the copulatory piece and vaginal appendix, we compared gene expressions for each stage and sex among the four species. We found 1536 and 1306 DEGs (false discovery rate [FDR] < 0.05) in the PE and PL stages of males, respectively, and 546 and 1959 DEGs in the PE and PL stages of females, respectively. We classified these DEGs according to the different expression profiles among the species for each stage and sex using a hierarchical clustering method and constructed dendrograms that showed the similarity of the expression profiles among the four species (Figs. 2, 3).
In the male PE stage (MPE), the DEGs were divided into six clusters; in all of these clusters, pairs of species with a long and a short copulatory piece showed similar expression patterns, and no cluster contained pairs of species with similar copulatory piece length that showed similar expression profiles (Fig. 2a). In the male PL stage (MPL), the DEGs were divided into eight clusters (MPL1–MPL8), and the DEGs in MPL8 showed a divergence between the species with long (Ar and In) and short (Ko and Es) copulatory pieces (Fig. 2b). The expression profiles of DEGs in the MPL3 cluster were similar between Ar and In, and the DEGs in the MPL4 cluster were similar between Ko and Es.
In the female PE stage (FPE), the DEGs were divided into five clusters, and the DEGs in the FPE5 cluster showed similar expression profiles between Ar and In and between Ko and Es, which were species pairs with similar vaginal appendix lengths. However, the FPE5 cluster contained only seven genes (Fig. 3a). The expression profiles of the DEGs in the other clusters did not show similar expression between Ar and In or between Ko and Es. In the female PL stage (FPL), DEGs were divided into five clusters, and the DEGs in the FPL1 cluster showed similar expression between Ar and In, and between Ko and Es, and the DEGs in the FPL5 cluster showed similar expression profiles between Ko and Es (Fig. 3b).
We assume that the DEGs in the clusters with similar expression profiles between two species with similar genital lengths, Ar and In (long) and Ko and Es (short), are potentially involved in the development of genital parts with different lengths among species. The MPL8 and FPL1 clusters showed this long vs. short genital pattern, and these were included in the search for genes involved in the genital differences.
Gene clusters potentially related to genital morphogenesis and interspecific genital size differences
We performed GO enrichment analyses to identify clusters enriched with functions related to genital development and thus potentially related to interspecific genital size differences. We assumed that genes involved in “imaginal disc development” and “cuticle development” were involved in genital development (see Materials and Methods). In the male clusters, MPE2, MPE4, and MPL4 were enriched with GO terms for imaginal disc development, and the MPL4 cluster was also enriched with GO terms for cuticle development (Table 2). The FPL1 cluster was enriched with GO terms for imaginal disc development and cuticle development in the female clusters. We also determined which clusters contained genes involved in genital development (see Material and Methods). In the male clusters, MPE2 contained Abdominal-B (Abd-B). In the female clusters, FPE2 contained doublesex (dsx), and FPL1 contained Abd-B.
Thus, we selected MPE2, MPE4, MPL4, FPE2, and FPL1 as the candidate gene clusters for genital morphogenesis. Of these, MPE2, MPE4, and FPE2 did not show similar gene expression profiles between species with similar genital lengths, and hence these might not be related to the genital differences. Therefore, MPE2, MPE4, and FPE2 were excluded from the following analyses. The remaining MPL4 and FPL1 male and female candidate gene clusters shared 105 genes, which accounted for 41.5% of genes in MPL4.
DEGs involved in interspecific differences in genital size
Having identified two clusters in males (MPL4, MPL8) and two clusters in females (FPE5, FPL1) as candidate clusters that may contain genes involved in differences in genital part length, we focused on genes with the GO terms “imaginal disc development” and “cuticle development”. We also focused on TFs because these are directly involved in regulating the expression of many genes and are important for organ development. Expression changes in TFs can be primary factors involved in differences in genital size. We specifically focused on whether significant differences existed between species with long vs. short genital parts in our comparison of expression patterns. When the expression profiles were not explained by differences in genital length, we investigated which of the four comparisons between species with different genital lengths (Ar vs. Ko, Ar vs. Es, In vs. Ko, and In vs. Es) showed significantly different expression profiles.
In males, MPL4 contained 5 TFs, 29 genes with the GO term “imaginal disc development”, and 13 genes with the GO term “cuticle development” (Table S4). Of these, the expression levels of a TF (bunched [bun]) and three genes with imaginal disc development (CG14073, Ubiquitin specific protease 8 [Usp8], and combover [cmb]) were significantly different in the comparisons between one of the species with long genital parts (Ar) and those with short genital parts (Ko and Es) in the PL stage. No genes showed significant differences in the other species with long genital parts (In) and those with short genital parts. These four genes in MPL4 showed higher expression levels in Ar than the other species, and although not significant, CG14073, Usp8, and cmb showed lower expression in In than Ko and Es (Fig. 4). Therefore, these four genes may be involved in the formation of a long copulatory piece only in Ar. MPL8 contained a TF (Mediator complex subunit 24 [MED24]) and a gene with cuticle development (obstructor-A [obst-A]). These two genes showed significant differences in expression profiles between species with long vs. short genital parts (Table S4). MED24 and obst-A showed higher expression levels in Ar and In than Ko and Es in the PL stage, except for post hoc comparisons between In vs. Ko or Es (Fig. 4). Thus, high expressions of MED24 and obst-A may be commonly involved in differences in copulatory piece length in these species.
In females, the FPE5 cluster contained no TF or gene with the GO term “imaginal disc development” or “cuticle development”, so we did not find candidate genes for differences in genital length. In the FPL1 cluster, we found 15 TFs, 48 genes involved in imaginal disc development, and 21 genes involved in cuticle development. Of these, 15 TFs, 44 genes for imaginal disc development, and 12 genes for cuticle development showed significant differences in the expression profiles between species with long and short vaginal appendices (Table S5). Abd-B was included in these genes (Fig. S3, Table S5). Three genes, dimmed (dimm; TF), Formin homology 2 domain containing (Fhos; imaginal disc development), and Cuticular protein 56F (Cpr56F; cuticle development), showed the lowest FDRs. In the PL stage, these genes showed significantly lower expression in Ar and In than Ko and Es, and post hoc comparisons indicated significant differences between species with long vs. short genital parts (Fig. 4). Therefore, dimm, Fhos, and Cpr56F may be commonly involved in differences in the vaginal appendix length between species with long (Ar and Ko) and short genital parts (Ko and Es).