3.1 FASTQC quality reports
FASTQC quality reports was given quality control to Mus musculus chip-sequences with SRA accession number DRR311742 and DRR311743. FASTQC results were given in Fig. 1(a) for DRR311742 and in Fig. 1(b) for DRR311743.
3.2 Bowtie2
Mus musculus chip-sequences with SRA accession number DRR311742 and DRR311743 were mapped using Mus musculus ref Seq using Bowtie2 (Fig. 2).
3.3 Collect alignment before:
We use the tool Collect Alignment Summary Metrics tool take the summary of our mapping done above. Table 1 contains the alignment summary for DRR311742 and Table 2 for DRR311743.
Next, we removed the PCR duplicates using RmDup. Table 3 & 4 contains the Alignment summary for DRR311742 and DRR311743 and removing PCR duplicates.
3.4 MACS2 Callpeak :
As per the alignment summary (Table 3, 4), we see that the reads are less post RmDup which implies that the duplicate reads are removed. Next, we use MAC2 call peak tool to identify areas in the genome that are enriched with the aligned reads. Model-based Analysis of ChIP-Seq (MACS) is a commonly used tool for identifying transcription factor binding sites. The algorithm confines the influence of genome complexity to evaluate the significance of enriched ChIP regions. This tool improves the spatial resolution of binding sites by combining the information of both sequencing tag position and orientation. Here, MACS is used along with a control sample (DRR311742) which increases specificity of the peak calls (Fig. 3). MACS2 models the distance between the paired forward and reverse strand peaks and uses 1000 enriched regions to model the distance between the forward and reverse strand peaks (Fig. 4).
3.5 Motif analysis:
We identify the motifs present in our Myelofibrosis genome ChIP-Seq. We used SeqPos motif analysis tool. The file, top 100 most significant peaks in bed format was selected for motif identification (Table 5).
3.6 SWISS PORT
Genes were selected from the Myelofibrosis genome, model using Swiss prot with Ramachandran plot.
The receptor model and corresponding Ramachandran plot results are given in Fig.5. Accession number used for modeling is given in table 6.