Differential gene expression analysis
The wild and Pde6c mutant Danio rerio comparison transcript counts (matrix file) were used for differential gene expression with the Empirical analysis of Digital Gene Expression in R (edgeR) package of Bioconductor with primary parameters like the false discovery rate (FDR), log fold change (logFC), log counts per million (logCPM), and p-value [12, 13]. Unigenes with p-values less than 0.001 (p < 0.001) and fold change of more than 4 (logFC > 4) were considered significantly differentially expressed. We discovered 216 upregulated and 3,527 downregulated DEGs in the pde6c mutant conditions. The hierarchical clustering heatmap, MA plot, and volcano plots were generated to represent the up- and downregulated genes (logFC ± 4 and p < 0.001; Figure 1). A heatmap is a graphical representation of data that uses a color-coding system to represent different values. Figure 1A represents the up- and downregulated genes in green and red, respectively.
A volcano plot is a type of scatterplot that is used to quickly identify changes in large datasets composed of replicate data. It combines a measure of statistical significance from a statistical test (p-value from an analysis of variance [ANOVA] model) with the magnitude of the change, enabling quick visual identification of those data points (genes) that display large magnitude changes that are also statistically significant. The volcano plot in Figure 1B was constructed by plotting the negative log of the log10 FDR value on the y-axis (usually base 10). This results in data points with low log10 FDR values (highly significant) appearing toward the top of the plot. The x-axis is the logFC between the two conditions (wild and mutant zebrafish).
An MA plot is an application of a Bland–Altman plot for a visual representation of genomic data. The MA plot in Figure 1C visualizes the differences between measurements taken in wild and mutant zebrafish DEGs, by transforming the data into M (log ratio) and A (mean average) scales logCPM (counts per million) and logFC), then plotting these values. The red color indicates the significant genes, and the black color indicates non-significant genes. We predicted the coding sequence length base pair (bp), transcript length (bp), genome span (bp), 5′ untranslated region (UTR) length (bp), 3′ UTR length (bp), and percentage of the GC (Guanine Cytocine) content of DEGs (Figure 2). The gene density of an organism’s genome is the ratio of the number of genes per number of base pairs. Figure 2 shows all parts of the genes, such as the coding sequence length, transcript length, genome span, 5′ UTR length, 3′ UTR length, and percentage of GC content compared with the zebrafish genome’s density. The results revealed that all the parts of the gene’s density (List, DEGs) fluctuate compared with the zebrafish genome. Also, we predicted the distribution of DEGs on zebrafish chromosomes (genome-wide distribution), distribution of gene type, number of exons (coding genes), and number of transcript isoforms per coding gene (Figure 3). Figure 3A revealed that all the query genes were distributed on 25 chromosomes, and the mitochondria genome of zebrafish with the exeption of chromosome 4. This is because the long arm of chromosome 4 is unique among the zebrafish genomic regions owing to its relative lack of protein-coding genes and extensive heterochromatin. Figure 3B shows that the protein-coding (mRNA) was more distributed than the others, while Figure 3C shows that exon 4 is more distributed among the number of genes, and Figure 3D shows that one and two transcripts per gene are equally distributed among the number of genes.
Functional annotation
All the DEGs were uploaded to the GO Enrichment Analysis tool and database for annotation, visualization and integrated discovery (DAVID) tool using the complete zebrafish genome as the background. The molecular functions (MFs), biological processes (BPs), cellular components (CCs), and pathways were predicted in the significantly enriched GO terms of the differentially express genes (Figure 4). The DEGs were involved in various MFs, such as small molecule binding (GO: 0036094; FDR = 6.33e−11), nucleotide binding (GO: 0000166; FDR = 6.52e−11), nucleoside phosphate binding (GO: 1901265; FDR = 6.52e−11), cation-transporting ATPase activity (GO: 0019829; FDR = 1.34e−08), ATPase-coupled ion transmembrane transporter activity (GO: 0042625; FDR=1.34e−08), active ion transmembrane transporter activity (GO: 0022853; FDR = 2.33e−08), purine nucleotide binding (GO: 0017076; FDR = 2.43e−08), purine ribonucleotide binding (GO: 0032555; FDR = 3.78e−08), nucleoside binding (GO: 0001882; FDR = 3.78e−08), and ribonucleotide binding (GO: 0032553; FDR = 4.45e−08) functions. Among these MFs, most of the genes are involved in small molecule binding (171 genes) and nucleoside phosphate binding (164 genes) functions.
The DEGs are involved in various BPs, such as embryo development (GO: 0009790; FDR = 5.81e−11), retina development in camera-type eyes (GO: 0060041; FDR = 2.48e−10), system development (GO: 0048731; FDR=2.48e−10), embryo development ending in birth or egg hatching (GO: 0009792; FDR = 2.48e−10), chordate embryonic development (GO: 0043009; FDR = 2.48e−10), animal organ development (GO: 0048513; FDR = 6.93e−10), eye development (GO: 0001654; FDR = 2.44e−09), camera-type eye development (GO: 0043010; FDR = 5.81e−09), sensory organ development (GO:0007423; FDR = 1.62e−07), and the cellular developmental process (GO: 0048869; FDR = 9e−07). Among these BPs, most genes are involved in system development (184 genes), animal organ development (141 genes), and cellular developmental processes (121 genes).
The DEGs are involved in various CCs, such as the macromolecular complex (GO: 0032991; FDR = 0e+00), cytosol (GO: 0005829; FDR = 2.18e−12), cell periphery (GO: 0071944; FDR = 1.61e−11), plasma membrane (GO: 0005886; FDR = 1.61e−11), protein complex (GO: 0043234; FDR = 1.61e−11), membrane protein complex (GO: 0098796; FDR = 9.62e−11), neuron part (GO: 0097458; FDR = 3.74e−10), plasma membrane part (GO: 0044459; FDR = 3.84e−10), whole membrane (GO: 0098805; FDR = 5.66e−10), and non-membrane-bounded organelle (GO: 0043228; FDR = 1.24e−08) functions. Most genes are involved in the macromolecular complex (169 genes), cell periphery (107 genes), and plasma membrane (105 genes).
Pathway analysis
Pathway analysis helps elucidate data from canonical prior knowledge structured in the form of pathways. It allows finding distinct cell processes, diseases, or signaling pathways that are statistically associated with the selection of DEGs [14]. The DEGs are further analyzed in the pathway functional analysis using the DAVID annotation tool. They are involved in various KEGG pathways, such as the phototransduction (12 genes), mRNA surveillance (17 genes), phagosome (25 genes), glycolysis/gluconeogenesis (15 genes), adrenergic signaling in cardiomyocytes (29 genes), ribosome (20 genes), citrate cycle (TCA cycle; 8 genes), insulin signaling (24 genes), oxidative phosphorylation (20 genes), and RNA transport (22 genes) pathways. Most genes are involved in adrenergic signaling in the cardiomyocyte (29 genes), phagosome (25 genes), insulin signaling (24 genes), and RNA transport pathways (20 genes; Figure 5).
In this study, we focus on the phototransduction (dre04744; Table 1), phagosome (dre04145; Table 2), glycolysis/gluconeogenesis (dre00010; Table 3) and insulin signaling pathways (dre04910; Table 3).
Gene network analysis
The DEGs were used to construct gene–gene interactions using the STRING tool (https://string-db.org/), which also hides the disconnected nodes in the network. The results showed the analyzed number of nodes (426), expected number of edges (1,235), number of edges (1,512), average node degree (7.1), average local clustering coefficient (0.363), and Protein-protein interaction (PPI) enrichment p < 1.58e−14. We constructed the gene–gene network for DEGs with their respective minimum required interaction score (0.400). We mainly highlighted different colors for the phagosome (red), glycolysis/gluconeogenesis (blue), and insulin signaling (green) pathway genes’ interaction in the main network (Figure 6).
The primary network converted four subnetworks, namely, those of the phototransduction (red), phagosome (blue), glycolysis/gluconeogenesis (green), and insulin signaling (yellow) pathway genes. The phototransduction pathway (red) subnetwork showed the number of nodes as 11, expected number of edges as 1, real number of edges as 32, average node degree as 5.82, average local clustering coefficient as 0.743, and PPI enrichment as p < 1.0e−16. The subnetwork results suggested that all the genes involved were directly connected and involved in the phototransduction pathway (Figure 7A). The phagosome pathway (blue) genes’ subnetwork results showed the number of nodes as 25, expected number of edges as 16, real number of edges as 83, average node degree as 6.64, average local clustering coefficient as 0.803, and PPI enrichment as p < 1.0e−16. This subnetwork genes’ interaction results showed that the cybb gene did not interact with any genes, but the remaining genes connected directly or indirectly to each other. This gene (cybb) may be involved individually in the phagosome pathway (Figure 7B). The glycolysis/gluconeogenesis pathway (green) subnetwork showed the number of nodes as 15, expected number of edges as 3, real number of edges as 65, average node degree as 8.67, average local clustering coefficient as 0.741, and PPI enrichment as p < 1.0e−16. These subnetwork results suggested that all the genes involved were directly connected and involved in the glycolysis/gluconeogenesis pathway (Figure 7C). The insulin signaling pathway (yellow) subnetwork showed the number of nodes as 23, expected number of edges as 44, real number of edges as 87, average node degree as 5.57, average local clustering coefficient as 0.585, and PPI enrichment as p < 6.47e−09. These subnetwork results suggested that the flot2a and mknk2b genes are not connected to any genes, but the rest are connected directly or indirectly (Figure 7D). The flot2a and mknk2b genes are involved in the insulin signaling pathway individually in the subnetwork pathway. Overall, the gene network results suggested that most of the pathway genes interacted directly or indirectly with each other and were involved in specific cascade signaling pathways in retinal degeneration.