intro-to-rnaseq-with-galaxy

Testing for Differential Expression using DESeq2

Much of this explanation has been adapted from these two sources:

The following DESeq2 steps are run all at once in Galaxy.

Image Source

Create separate collections for the counts files for Mock 12 hr and HIV 12 hr.

First, we filter for the samples of interest:

Galaxy requires that lists have an identifier column

Test for Differential Expression using DESeq2

DESeq2 will take the count tables that we generated, one per sample, and make a comparison for each gene between two conditions: HIV and Mock. The term that DESeq2 uses for this condition is “factor” and the ordering of our factor levels will determine how we interpret the resulting expression fold changes. Here, we’ll set Factor Level 1 to HIV and Factor Level 2 to Mock. Any resulting upregulated genes, with log2 fold change > 0, can then be interpreted as being upregulated in HIV samples with respect to Mock.

View and interpret DESeq2 output files

Question 9: What are the top two most significant genes? Does the direction of change for gene MYC agree with our observation in Question 6?
Question 10: What observations can you make from the PCA plot? Do samples cluster as expected?

The p-value plot shows a histogram of p-values for all the genes that were examined. P-values give the probability of getting a logFC as extreme as observed if the true logFC = 0 for that gene (null hypothesis). Random P-values are expected to be uniform, if you have true positives you should see a peak close to zero.

Image Source

Question 11: What observation can you make about the pvalue distribution, does it look like there are many true significant results? Note that the published dataset has been downsampled for instructional purposes.

Previous: Gene quantification