Mus musculus chip-sequences with SRA accession number SRR13960849 and SRR13960850 were mapped using Mus musculus ref seq using Bowtie2. The Bowtie2 mapping output is given in Fig. 2 for SRR13960849 and Fig. 3 for SRR13960850.
We use the tool Collect Alignment Summary Metrics tool take the summary of our mapping done above. Table 1 contains the alignment summary for SRR13960849 and Table 2 for SRR13960850.
Next, we removed the PCR duplicates using RmDup. Table 3 & 4 contains the Alignment summary for SRR13960849 and SRR13960850 and removing PCR duplicates.
As per the alignment summary (Table 3, 4), we see that the reads are less post RmDup which implies that the duplicate reads are removed. Next, we use MAC2 call peak tool to identify areas in the genome that are enriched with the aligned reads. Model-based Analysis of ChIP-seq (MACS) is a commonly used tool for identifying transcription factor binding sites. The algorithm confines the influence of genome complexity to evaluate the significance of enriched ChIP regions. This tool improves the spatial resolution of binding sites by combining the information of both sequencing tag position and orientation [22]. Here, MACS is used along with a control sample (SRR13960849) which increases specificity of the peak calls (Fig. 4). MACS2 models the distance between the paired forward and reverse strand peaks and uses 1000 enriched regions to model the distance between the forward and reverse strand peaks [23].
The cross-correlation metric diagram given in Fig. 5 is worked out as the Pearson’s linear correlation joining the Crick strand and the Watson strand and this metric standard usually produces two peaks when this cross-correlation is plotted against the shift value, one corresponding to the read length (“phantom” peak) and the other average fragment length of the library.
The absolute and relative height of the above said peaks are useful in determining of the success of any ChIP-seq experiment. A high-quality immunoprecipitation is characterized by a ChIP peak that is which should be higher than the “phantom” peak and very small or no ChiP peak is seen in failed experiments. Our results show high and only ChiP peak and no “phantom” peak which clearly says our results contain a high-quality immunoprecipitation [17].
Next, from the annotation results our peaks table by taking top 100 most significant peaks, we identify the genes overlapping with these peaks which are given in Table 5. The pathways of these genes were identified from KEGG pathways (Table 6).
Finally, we identify the motifs present in our GABA ChiP-seq. We used used SeqPos motif analysis tool. The file, top 100 most significant peaks in bed format was selected for motif identification (Table 7, 8).