intro-to-rnaseq-with-galaxy

Intro to RNA-Seq using Tufts Galaxy

The introductory Slides give an overview of RNAseqencing technologies and our workflow.

Dataset

Our dataset is from the publication:

Chang et al. Next-Generation Sequencing Reveals HIV-1-Mediated Suppression of T Cell Activation and RNA Processing and Regulation of Noncoding RNA Expression in a CD4+T Cell Line. mBio 2011doi: 10.1128/mBio.00134-11

HIV infects CD4+ T cells, the same cells which are critical to mounting an immune response to the virus infection.

Image Source

The experiment aims to compare the mRNA produced by Mock and HIV infected CD4+ T cells, both 12 hr and 24 hr after infection.

The raw reads from the study have been downsampled to 1 million reads per file in order to speed up computation. The full dataset is available from NCBI under accession SRP013224.

The following steps will walk you through how to run the tools. In each step certain parameters are set. If a parameter option appears on the screen but this tutorial doesn’t mention how to set it, leave it at the default. There are questions throughout, which serve to guide you through the results and check your understanding.

Create a new history

Import the raw data from a shared data library on our server

You’ll see the collection (or list) subsampled_chang_2011 in your history.

View Fastq files

The first 4 lines constitute the first sequencing read:

@SRR497699.30343179.1 HWI-EAS39X_10175_FC61MK0_4_117_4812_10346 length=75
CAGATGGCCGCAGAGGAAGCCATGAAGGCCCTGCATGGGGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGAC
+
IIIIGIIHFIIIIBIIDII>IIDHIIHDIIIGIFIIEIGIBDDEFIG<EIEGEEG;<DB@A8CC7<><C@BBDDB
  1. Sequence identifier
  2. Sequence
    • (optionally lists the sequence identifier again)
  3. Quality string

Next: Process Raw Reads

Previous: Introduction to Galaxy