Gene Expression Analysis of RNA-Seq Data from Human Osteoarthritis Knee

DOI: https://doi.org/10.21203/rs.3.rs-1939694/v1

Abstract

Osteoarthritis has always been one of the most researched degenerative disorders pervasively. Moreover, this is been classified under complex disorders with an emphasis on the clinical analysis for the older people. However, a genomic understanding of the hereditary impact of this disease seems lacking. The basic genomic analysis has identified various biomarkers, but lacks prognosis. In this direction, we have retrieved publicly available RNA- seq data from the NCBI Sequence Read Archive (SRA). Utilising the Galaxy browser, the data analysis was performed to identify the differentially expressed genes from SRP IDs

{ERP105501, SRP322624}. The overlaps of these studies revealed 105 upregulated and 127 downregulated genes. These sets were further processed for multi-omic analysis through the databases. We identified specific process involvement through the Gene Ontology followed by hub genes through Protein -protein interactions. In addition to this, the transcription factors were identified from the list through BART search and miRNAs interacting with the identified hub genes through miRNet. This study aims to perform genomic analysis of osteoarthritis by identifying the differentially expressed genes among multiple tissues from human osteoarthritis knee joint and further analyse to find the key genes, biological pathways, transcription factors and miRNAs that can be considered as hub genes of the Osteoarthritis disease in humans.

1. Introduction

Osteoarthritis (OA) is a degenerative joint disease that can damage different joint tissues. It is without a doubt the most common kind of arthritis. It is a very complex and multifaceted disease and is impacted by a variety of distinct pathogenetic pathways in both its onset and progression. Osteoarthritis is a degenerative disease that over the time causes the cartilage to gradually vanish. Being the most prevalent kind of arthritis, it is the primary cause of joint pain and inflammation in older citizens around the world, a major cause of disability, and it is fairly common in society. Arthroplasty can treat it when it has developed to its most

severe stage of symptoms. The knee is one of the joints that is most frequently affected by osteoarthritis. Knee osteoarthritis makes it difficult to walk, climb stairs, and enter and exit chairs and bathtubs because it produces stiffness, edoema, and pain. Osteoarthritis of the knee may lead to disability. In spite of its diverse clinical manifestations, it appears that some types of OA are heritable. Treatment for osteoarthritis must be multidisciplinary and adapted to the patient's needs.

The pathogeny of osteoarthritis is particularly complicated due to the interplay of a wide range of general, metabolic, and local variables, as well as a proliferative response of subchondral bone and synovial inflammation, which lead to the degradation of the affected joints. Osteoarthritis is seen as a heterogenous group that is more than just an age-related disease, making its diagnosis unpredictable with actual potential therapies [1]. Because of the numerous gene interactions, OA genetics deviates from the typical mendelian inheritance pattern. The onset, course, and severity of the disease may be influenced by a number of gene factors that interact with the numerous alterations brought on by a number of genes. It has been established that between 35 and 65 percent of this disease's susceptibility is genetic. OA thus better fits the complex, multifaceted class of common disorders.

Therefore, genetic analysis in OA becomes crucial for better understanding of the disease's epidemiology. The fast development of next-generation sequencing technology for high-throughput genomics has made it possible for bioinformatic analysis to find important genes, signalling pathways, transcriptional factors, and miRNAs. By locating genes involved in disease risk or progression, we can gain a better understanding of the molecular causes of OA and possibly uncover new therapy options. By identifying sets of genetic changes associated with disease risk or the development of OA, it will be possible to identify individuals at high risk and better track disease progression.

Gene expression profiling is an effective technique for analysing and predicting toxicological processes. RNA-Seq technology has become a competitive alternative to traditional microarray techniques for transcriptional profiling. Using RNA-Sequencing, we can uncover new phenotypes of OA with specific markers, revise existing classifications, and gain a better understanding of the endogenous heterogeneity of cells and tissues in human OA cartilage. RNA-Seq also detects additional DEGs and related canonical pathways in addition to the majority of transcriptome alterations detected by microarrays. Greater dynamic range made possible by RNA-Seq improves sensitivity and accuracy as well as the possibility to spot changes in non-coding gene expression that could provide significant new findings.

2. Materials And Methods

2.1 DATA CURATION

The data for “osteoarthritis” was searched in GEO using the combination of filters “gene expression”, “rna-seq”, and “human”. Experiments which included samples of osteoarthritis candidates and non-osteoarthritis candidates were considered for better results of differentially expressed genes.

2.2 DATA ACCESS AND PROCESSING

The screened data samples were imported into the Galaxy Browser [5] for further analysis using the tool “Faster Download and Extract Reads in FASTQ” [6]. This tool pulls data from the Short Read Archive (SRA) from the National Center for Biotechnology Information in fastq format (NCBI). It is founded on the SRA Toolkit's fasterq-dump function.

The imported sample read files were then pre-processed for adapter removing and much more quality filter processes using the “fastp” (https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.23.2+galaxy0 ) tool [7]. The tool fastp is intended to offer quick all-in-one pre-processing for FASTQ files. To achieve great performance, this utility was created in C + + and supports multithreading.

2.3 MAPPING OF READS TO THE GENOME

We must first identify the sequences' genomic origin in order to determine which genes they belong to in order to make sense of the reads. Aligning or "mapping" the reads to the reference is the method used when there is a reference genome for the organism. A reference genome, also known as a reference assembly, is a collection of nucleic acid sequences compiled to serve as a model for the genetic makeup of a species. They frequently consist of a mosaic of various nucleic acid sequences from each individual rather than the exact gene set of any particular organism since they are assembled from the sequencing of various individuals rather than that of a single organism.

We used the tool called HISAT2 (https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1) [8] to perform genome mapping. A sensitive and quick spliced alignment programme is

HISAT. As part of HISAT, we have created a brand-new indexing method called hierarchical indexing that uses two different types of indexes: (1) one global FM index representing the entire genome, and (2) numerous separate local FM indexes for small regions collectively covering the genome. Hierarchical indexing is based on the Burrows-Wheeler transform (BWT) and the FM index. HISAT offers a number of alignment techniques created especially for mapping various RNA-seq read types. Together, these factors allow HISAT to align reads incredibly quickly and sensitively, especially those spanning two or more exons.

2.4 COUNTING READS PER GENES

Counting the number of reads that map to each gene is the simplest way to calculate the amount of each gene's expression using RNA-seq (read count). A gene transfer format (GTF) file comprising gene models is used in this gene-level quantification method. Each model represents the structure of the transcripts made by a certain gene. Quantifying the number of reads per gene, or more precisely the number of reads mapping to each gene's exons, is a crucial initial step in comparing the expression of individual genes under various situations (such as with or without OA disease). The lightweight read counting tool FeatureCounts (https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/featurecounts/featurecounts/2.0.1 + galaxy2) [9] was created solely in the C programming language. In SAM/BAM files, it can be used to quantify both gDNA-seq and RNA-seq reads for genomic features. A component of the subread package is featureCounts. By choosing the built-in option, we can use featureCounts' built-in annotations.

2.5 DIFFERENTIALLY EXPRESSED FEATURES

A gene is said to be differentially expressed if there is a statistically significant difference or change in read counts or expression levels between two experimental conditions. In this step, we identified the genes which are differentially expressed between the normal samples and diseased samples of OA. When working with RNA-seq data and conducting Differential Gene Expression (DGE) analysis, DESeq2 (https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/deseq2/deseq2/2.11.40.7

+galaxy1) is a fantastic tool. The read count files from several samples are combined, normalised for sequencing depth and library composition, and then put into a large table with genes in the rows and samples in the columns [10].

The differentially genes are collected along with a unique gene ID which shall be used to get the gene name. This is where we annotate the differentially expressed genes for their

gene names for better identification and references across future analysis. The “Annotate DESeq2/DEXSeq output tables” (https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/deg_annotate/deg_annotate/1.1.0) adds gene symbols, biotypes, locations, and other information to the DESeq2/edgeR/limma/DEXSeq output table. The data you want to include can be customised. These details ought to be included in the input GTF/GFF file as characteristics of the feature we select.

2.6 GENE ONTOLOGY

The most comprehensive source of knowledge on gene functions is the Gene Ontology (GO) knowledgebase (http://geneontology.org/). Large-scale molecular biology and genetics investigations in biomedical research can be computationally analysed on the basis of this knowledge, which is both machine- and human-readable. The current best representation of the "world" of biology is described by the gene ontology [11], which is a network of biological classifications. From this study, it is possible to infer the potential molecular functions, cellular locations, and activities that each gene product may engage in.

2.7 PREDICTION OF TRANSCRIPTION FACTORS

A transcription factor (TF) is a DNA-binding protein that binds to certain DNA sequences and modifies the transcription of a group of specific genes, hence regulating gene expression in the cell. We used BART tool to predict the transcription factors associated with each of our gene lists. When a query cis-regulatory profile is derived from a gene set or a ChIP- seq dataset, BART (www.bartweb.org) identifies transcription factors whose genomic binding profile correlates with that profile [12][13].

2.8 PROTEIN-PROTEIN INTERACTION ANALYSIS

An organism's protein-protein interaction (PPI) network acts as the skeleton for its signalling circuitry, which controls how cells react to environmental and genetic signals. The prediction of gene function and cellular behaviour in response to various inputs may be improved by understanding this circuitry.

3. Results And Discussion

3.1 DATA CURATION

 

From the search results, 16 samples [8 osteoarthritis (biological replicates) and 8 non- osteoarthritis] collected from the Posterior Lateral Condyle of knee joint (ERP105501) and 6 samples [3 osteoarthritis and 3 non-osteoarthritis] collected from the Synovial Tissue of knee joint (SRP322624) were selected and imported into galaxy browser.

The transcriptome response of osteoarthritis in cartilage and possible patient subgroups were investigated using large-scale RNA-Seq analysis. The posterior lateral condyle of the knees of 60 patients with osteoarthritis (OA) who underwent complete knee replacement surgery and 10 control non-OA patients who underwent amputation were studied for data collection.

Pigmented villonodular synovitis (PVNS) is a rare disorder that causes significant joint damage and has a high recurrence rate even after surgical resection. It is caused by benign synovial tissue growth. However, a lack of knowledge about the pathophysiology prevents effective treatment. overall layout: In order to examine the PVNS transcriptome, the gene expression patterns of three patients with PVNS and three patients with osteoarthritis (OA) were investigated using integrated RNA sequencing (RNA-seq) and microarray.


Table 4.1.1 Metadata of the selected sample

 

 

SRP322624

 

Run

Assay Type

AvgSpotLen

Bases

Experiment

subject_status

tissue

1

SRR14728272

RNA-Seq

180

4353820200

SRX11065282

osteoarthritis

Synovial tisssue

2

SRR14728273

RNA-Seq

180

4353711660

SRX11065283

osteoarthritis

Synovial tisssue

3

SRR14728274

RNA-Seq

180

4353738480

SRX11065284

osteoarthritis

Synovial tisssue

4

SRR14728275

RNA-Seq

180

4353815520

SRX11065285

Pigmented villonodular synovitis

Synovial tisssue

5

SRR14728276

RNA-Seq

180

4353780780

SRX11065286

Pigmented villonodular synovitis

Synovial tisssue

6

SRR14728277

RNA-Seq

155

3755632680

SRX11065287

Pigmented villonodular synovitis

Synovial tisssue

 

ERP105501

 

Run

Assay Type

AvgSpotLen

Bases

Experiment

subject_status

tissue

1

ERR2209860

RNA-Seq

202

3510730508

ERX2264218

osteoarthritis\, knee

posterior lateral condyle of femur

2

ERR2209861

RNA-Seq

202

3495384770

ERX2264218

osteoarthritis\, knee

posterior lateral condyle of femur

3

ERR2209866

RNA-Seq

 

 

ERX2264221

osteoarthritis\, knee

posterior lateral condyle of femur

4

ERR2209881

RNA-Seq

202

2505061388

ERX2264236

osteoarthritis\, knee

posterior lateral condyle of femur

5

ERR2209865

RNA-Seq

 

 

ERX2264221

osteoarthritis\, knee

posterior lateral condyle of femur

6

ERR2209882

RNA-Seq

 

 

ERX2264236

osteoarthritis\, knee

posterior lateral condyle of femur

7

ERR2209892

RNA-Seq

 

 

ERX2264245

osteoarthritis\, knee

posterior lateral condyle of femur

8

ERR2209891

RNA-Seq

202

7475716798

ERX2264245

osteoarthritis\, knee

posterior lateral condyle of femur

9

ERR2209870

RNA-Seq

 

 

ERX2264225

vascular disease

posterior lateral condyle of femur

10

ERR2209883

RNA-Seq

197

3065893568

ERX2264237

vascular disease

posterior lateral condyle of femur

11

ERR2209896

RNA-Seq

 

 

ERX2264249

vascular disease

posterior lateral condyle of femur

12

ERR2209858

RNA-Seq

 

 

ERX2264216

vascular disease

posterior lateral condyle of femur

13

ERR2209884

RNA-Seq

 

 

ERX2264238

vascular disease

posterior lateral condyle of femur

14

ERR2209893

RNA-Seq

 

 

ERX2264246

vascular disease

posterior lateral condyle of femur

15

ERR2209894

RNA-Seq

202

4180762286

ERX2264247

vascular disease

posterior lateral condyle of femur

16

ERR2209871

RNA-Seq

202

3590830780

ERX2264226

vascular disease

posterior lateral condyle of femur


The samples were further pre-processed for quality improvement in fastp and considered for mapping. Reads with reduced quality features such as very low quality, very short read length were removed. Adapter sequences were automatically detected by fastp for paired end data and were removed. Mismatching base pairs in overlapping region of paired ends were also removed. From the fastQC output data, all samples had a sequence duplication levels close to 70%. But, the number of duplicate reads were substantially higher than what can be inferred from read coverage data. This problem is even worse for RNA-seq data, which are frequently dominated by the transcripts of a small number of genes. For RNA-seq data, the exceptionally high read coverage for the specific highly expressed transcripts can easily result in fastQC read duplication levels of 70% or greater [14].

3.2 MAPPING OF READS TO THE GENOME

 

The output of HISAT2 were very significant as on an average, above 98% of all reads across all samples were mapped to the genome. The human genome GRCh38/hg38 was used as the reference genome for mapping the reads. The index file for this genome version was available as a built-in in the HISAT2 tool and was used for quick and sensitive mapping.


Table 4.2.1 List of all samples' read mapping levels

 

Condition

Sample Name

Reads (millions)

Reads Mapped (millions)

Percentage Mapped

PLC

OA Sample 1

53.8

53.6

99.6282528

PLC

OA Sample 2

53.9

53.7

99.6289425

PLC

OA Sample 3

39.3

39.1

99.4910941

PLC

OA Sample 4

39.1

38.9

99.488491

PLC

OA Sample 5

27.3

27.2

99.6336996

PLC

OA Sample 6

27.3

27.1

99.2673993

PLC

OA Sample 7

80.5

80.1

99.5031056

PLC

OA Sample 8

78.8

78.5

99.6192893

PLC

Normal Sample 1

38.2

38

99.4764398

PLC

Normal Sample 2

43

42.9

99.7674419

PLC

Normal Sample 3

38.7

38.5

99.4832041

PLC

Normal Sample 4

33.6

32.7

97.3214286

PLC

Normal Sample 5

43.3

41.3

95.3810624

PLC

Normal Sample 6

78.4

78.1

99.6173469

PLC

Normal Sample 7

45.5

43

94.5054945

PLC

Normal Sample 8

114.3

113.5

99.3000875

Synovial

OA Sample 1

58.7

57.9

98.637138

Synovial

OA Sample 2

54.7

54

98.7202925

Synovial

OA Sample 3

54.7

53.9

98.5374771

Synovial

Normal Sample 1

56.3

55.5

98.5790409

Synovial

Normal Sample 2

54.4

53.6

98.5294118

Synovial

Normal Sample 3

48.7

48

98.5626283


3.3 COUNTING READS PER GENES

 

The featureCounts output summarized on an average of 78% assignment of reads across all samples. Of those Assigned reads, 98% of reads were mapped exactly once to the reference genome. Since only proportions under 70% should be checked for potential contamination, we can move on to the analysis. Less than 20% of the reads from all the samples were mapped to multiple locations on the reference genome. For next-generation sequencing technology short reads, this falls within the expected range.

3.4 DIFFERENTIALLY EXPRESSED FEATURES

 

The DeSeq2 tool was used to perform estimation on the differential expression of genes across all samples. As a result, we arrived at 61,498 unique lines both from Synovial tissue samples and PLC samples that can identify unique genes and also the expression levels of the gene by various parameters.

 

Internal normalisation is carried out by DESeq2 in which the geometric mean for each gene across all samples is determined. Then, this mean is divided by the gene counts in each sample. The size factor for a sample is the median of these ratios in that sample. Estimates of dispersion have a direct relationship with variance and and inverse relationship with mean. The dispersion is higher for small mean counts and smaller for large mean counts based on this connection. The variance of the genes will be the only factor affecting the dispersion estimates for genes with the same mean. As a result, for a given mean value, the dispersion estimates represent the variance in gene expression. Figure 4.5.1best describe the data fit to the model and thus the data we consider is proven fit to perform differential expression analysis.

 

Not all the lines out of DeSeq2 shall be statistically sound. It is important to screen the list to potential entries only. One of the columns in the output is the Fold change in log2. The input sequence of factor levels is crucial because the log2 fold changes are based on the principal factor level 1 vs factor level 2 comparison. In this instance, DESeq2 calculates fold changes of "Osteoarthritis" samples against "normal" from the first factor "Osteoarthritis," where the values represent up- or downregulation of genes in Osteoarthritis samples. p-value is the measure of the statistical significance of this change, therefore we filter the entry lines for significant p-value<0.05. Out of 61,498 lines, only 30,375 (the rest being scaffold genes in the chr.gtf file which was used because it doesn’t include unplaced or unlocalized contigs, but only assembled chromosomes) were annotated as genes belonging to Homo Sapiens. Further applying the foresaid filters, 3,305 lines (5.37%) were kept in Synovial data and 4,941 lines (8.03%) were kept in PLC data. Now, we will only choose genes with a fold change (FC) of greater than 2 or less than 0.5. We filter for "(abs(log 2 FC) > 1" (which denotes FC > 2 or FC 0.5) because the DESeq2 output file contains "(log 2 FC)," rather than FC itself. From 3,305 lines, this filter brought down the lines to 2394 (72.44%) in Synovial data and 2,239 (45.31%) in PLC data.

All the 2394 and 2239 gene IDs from the final list were annotated with reference to the human genome GRCh38/hg38 and the gene name was appended to each line. Now we have a list of unique genes differentially expressed in the sample classes. Out of 2394 genes in OA Synovial tissues, 1159 genes were estimated to be upregulated and 940 were downregulated. Likewise, in OA PLC tissues, 662 were upregulated and 1087 were downregulated.

Venn diagrams [15] were drawn to find the overlapping genes between the two data classes and arrived at 105 overlapping upregulated genes and 127 overlapping downregulated genes as seen in Figure 4.5.3. The list of these genes was extracted for further analysis based on upregulated and downregulated genes.


3.5 GENE ONTOLOGY

Gene ontology enrichment analysis for biological process listed that for the upregulated gene set, 20 biological processes were implicated while for the downregulated gene set, as large as 269 GO terms were implicated. Given the percentage of genes in the entire genome that are assigned to a given GO dataset, the P-value is the likelihood of the query genes from the reference dataset. In other words, the background distribution of annotation is compared to the GO keywords shared by the genes in the query list. The p-value was set to 95% confidence and was transformed -log10() form and the corresponding GO terms with the most significant adjusted p-values were screened. From the sorted list of GO terms with high significance, GO terms that come under specific growth and development processes were selected as biomarkers (figure 4.6.1 and figure 4.6.2) since osteoarthritis is a continuous degenerative disease.

3.6 PREDICTION OF TRANSCRIPTION FACTORS

 

BART displayed 918 Transcription factors regulating the expression of both the gene sets of our study. 4 proteins in the downregulated gene set and 5 in the upregulated gene set act as transcription factor in our OA phenotype and their significance is validated with their corresponding z-scores.


Table 4.7.1 Transcription Factors among our gene sets regulating expression of key genes as Biomarkers

 

Transcription Factor

Wilcoxon test statistic

P - Value

Z - Score

Expression

POU2F2

2.792

2.62E-03

3.2

down

RAD51

1.57

5.82E-02

2.663

down

BATF

3.061

1.10E-03

2.423

down

HLF

1.285

9.94E-02

2.277

up

HNF4G

2.322

1.01E-02

2.107

up

CD74

2.718

3.29E-03

1.842

down

FOS

-0.855

8.04E-01

0.045

up

ZNF711

-1.366

9.14E-01

-2.154

up

EGR1

-1.534

9.37E-01

-2.547

up


3.7 PROTEIN-PROTEIN INTERACTION ANALYSIS

 

22 interactions among the upregulated gene set and 105 interactions among the downregulated gene set were found from the PPI network analysis using STRING. Out of these FOS(4), GRIA2(4), ERBB4(4) were the highly interacted proteins and thus considered as hub genes, similarly in the downregulated gene set MELK(11), ASF1B(10), FOXM1(9) considered as hub genes.

4. Conclusion

This study focused on finding differentially expressed genes and analyse it to arrive at potential biomarkers of the osteoarthritis disease. To derive significant results, samples from different tissues was considered [16] and samples from Poster Lateral Condyle and Synovial Tissues were curated. These samples were analysed for Differential Expression of Genes in Galaxy browser and 105 overlapping upregulated genes and 127 downregulated genes were obtained initially. Zhao et al. had conducted RNA-Seq analysis on the Synovial samples and had identified 1220 genes differentially expressed between OA and PVNS groups [17]. Similarly, Soul J et al. compared samples of OA and non-OA to identify 2692 genes differentially expressed [18]. DGEs from our study were compared to the available data from Soul J et al. and we found our resulting DGEs were in accordance with their work as 28 genes remain as significant DGEs in common. Gene Ontology Enrichment Analysis were performed and more than 130 GO terms were collected for both gene sets combined. Of all the GO terms, terms that identify from growth and metabolism pathways with corresponding significance were selected and concluded as key pathways of osteoarthritis: SMAD protein signal transduction (GO:0060395), cell surface receptor signaling pathway (GO:0007166), response to growth factor (GO:0070848), positive regulation of cytokine production (GO:0001819), regulation of leukocyte activation (GO:0002694), cell surface receptor signalling pathway (GO:0007166), cellular response to organic substance (GO:0071310), positive regulation of interferon-gamma production (GO:0032729). Zhao et al. had declared leukocyte activation as the most significant GO term while there are no mutual conclusions from Soul J et al. By identifying the TFs regulating the two gene sets and finding those proteins in the gene sets (POU2F2, RAD51, BATF, CD74; HLF, HNF4G, FOS, ZNF711, EGR1) lead us to identify the

significant TFs as biomarkers. PPI Network analysis were performed to map the interactions within the gene sets and were detected for two hub genes (MELK, ASF1B, FOXM1; ERBB4, FOS, GRIA2) which can potentially impact in the osteoarthritis disease. To identify the key genes of both the samples, we filtered the top 10% of significant DGEs and the overlapping genes (ALDH1L1-AS2, CLCNKA, APOD, EMILIN3) can be considered as gene biomarkers.

Declarations

Conflict of Interest: None

 

 

Author’s Contribution:  KK designed the work . NR and EE carried out the work.

 

 

Acknowledgement: We sincerely acknowledge the bioinformatics facility provided by Bioinformatics department, SASTRA University.

References

1.            Fernandez-Moreno, M., Rego, I., Carreira-Garcia, V., & Blanco, F., 2008. Genetics in Osteoarthritis. Current Genomics, 9(8), 542-547

 

2.            Valdes, A., & Spector, T., 2008. The Contribution of Genes to Osteoarthritis. Rheumatic Disease Clinics of North America, 34(3), 581-603

 

3.            Williams, C. J., & Jimenez, S. A., 1993. Heredity, genes and osteoarthritis. Rheumatic diseases clinics of North America, 19(3), 523–543

 

4.            Paras Hospitals., 2022. Retrieved 8 July 2022, from https://www.parashospitals.com/blogs/osteoarthritis-knee

 

5.            Afgan, E., Nekrutenko, A., Grüning, B., Blankenberg, D., Goecks, J., & Schatz, M. et al., 2022. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Research, 50(W1), W345–W351

 

6.            Leinonen, R., Sugawara, H., & Shumway, M., 2010. The Sequence Read Archive. Nucleic Acids Research, 39(Database), D19-D21

 

7.            Chen, S., Zhou, Y., Chen, Y., & Gu, J., 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34(17), i884–i890

 

8.            Kim, D., Langmead, B., & Salzberg, S. L., 2015. HISAT: a fast spliced aligner with low memory requirements. Nature Methods, 12(4), 357–360

 

9.            Liao, Y., Smyth, G. K., & Shi, W., 2013. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7), 923–930

 

10.         Love, M. I., Huber, W., & Anders, S., 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12)

 

11.         Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, Thomas PD. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. Jan 2019, 47(D1), D419–D426

 

12.         Ma, W., Wang, Z., Zhang, Y., Magee, N., Feng, Y., & Shi, R. et al., 2021. BARTweb: a web server for transcriptional regulator association analysis. NAR Genomics and Bioinformatics, 3(2)

 

13.         Wang, Z., Civelek, M., Miller, C., Sheffield, N., Guertin, M., & Zang, C., 2018. BART: a   transcription   factor   prediction   tool   with   query   gene   sets    or    epigenomic profiles. Bioinformatics, 34(16), 2867–2869

 

14.         Why does FASTQC show unexpectedly high sequence duplication levels (PCR-duplicates)? DNA                            Technologies      Core.,    2022.     Retrieved             10           July               2022,     from https://dnatech.genomecenter.ucdavis.edu/faqs/why-does-fastqc-show-unexpectedly-high- sequence-duplication-levels-pcr-duplicates/

 

15.         Heberle, H., Meirelles, G., da Silva, F., Telles, G., & Minghim, R., 2015. InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams. BMC Bioinformatics, 16(1)

 

16.         Chang, L., Yao, H., Yao, Z., Ho, K., Ong, M., & Dai, B. et al., 2021. Comprehensive Analysis of Key Genes, Signaling Pathways and miRNAs in Human Knee Osteoarthritis: Based on Bioinformatics. Frontiers In Pharmacology, 12

 

17.         Soul, J., Dunn, S., Anand, S., Serracino-Inglott, F., Schwartz, J., Boot-Handford, R., & Hardingham, T., 2017. Stratification of knee osteoarthritis: two major patient subgroups identified by genome-wide expression analysis of articular cartilage. Annals Of the Rheumatic Diseases, 77(3), 423-423

 

18.         Zhao, Y., Lv, J., Zhang, H., Xie, J., Dai, H., & Zhang, X. (2021). Gene Expression Profiles Analyzed Using Integrating RNA Sequencing, and Microarray Reveals Increased Inflammatory Response, Proliferation, and Osteoclastogenesis in Pigmented Villonodular Synovitis. Frontiers In Immunology, 12