intro-to-galaxy-ngs-sarscov2

Read Alignment

Read Alignment is the process of comparing short reads with a reference genome to find the best-matching position. The Burrows-Wheeler Aligner (BWA) is a fast and accurate tool for both short and long read alignment, and JBrowse is a tool that enables viewing of the alignment results within the Galaxy interface.

Step 1: BWA alignmnent

The naive method of comparing each read in our dataset to each position in the reference sequence is too slow. Therefore, BWA builds an index of the reference sequence, which can be thought of as a lookup table for substrings present in our reference sequence. A short read can be compared to this lookup table in order to find potential matches. For more information on the Burrows-Wheeler Transform, see Stanford CS262 Lecture*

BWA builds an index of the reference sequence.

To run BWA:

Configuration of BWA

SAM format

BWA produces a BAM file, which is the compressed binary version of a Sequence Alignment Map (SAM). A SAM file consists of a Header section and an Alignment section. The Header section gives details about the file format and reference sequenced used in alignment, and the Alignment section gives information about each read that was aligned.

Image Source https://www.samformat.info/

(Optional) Run Samtools Flagstats to view aligment metrics

Result of Samtools Flagstat

Downsample BAM for quicker viewing

Before we view our alignment, we’ll downsample our BAM file to contain only a fraction of the original reads. This will be sufficient to view major variants present and confirm that we have sequenced the delta variant. NOTE: An alternative to this would be to increase the Maximum size of BAM chunks to 20,000,000 in the JBrowse settings in the following section, which will result in much slower loading of the sample.

View Downsampled BAM file using JBrowse

JBrowse is a convenient tool that allows viewing of alignments, genomes and gene annotation within the Galaxy interface.

Configure JBrowse viewer

Open JBrowse viewer

Previous: Process Raw Reads