What is the main challenge with RNA-seq alignment to a reference genome?

What is the main challenge with RNA-seq alignment to a reference genome?

The major challenge in RNA-Seq data analysis is the accurate mapping of junction reads to their genomic origins. To detect splicing sites in short reads, many RNA-Seq aligners use reference transcriptome to inform placement of junction reads.

What is RNA-seq alignment?

The RNA-seq read alignment program currently used by the Expression Atlas pipeline is called HISAT2, which stands for “hierarchical indexing for spliced alignment of transcripts 2”, and provides more accurate results with fast and sensitive alignment.

What is star alignment?

Spliced Transcripts Alignment to a Reference (STAR) is a fast RNA-seq read mapper, with support for splice-junction and fusion read detection. STAR aligns reads by finding the Maximal Mappable Prefix (MMP) hits between reads (or read pairs) and the genome, using a Suffix Array index.

Is Star splice-aware?

The alignment process consists of choosing an appropriate reference genome to map our reads against and performing the read alignment using one of several splice-aware alignment tools such as STAR or HISAT2.

What is RNA-seq used for?

RNA sequencing (RNA-Seq) uses the capabilities of high-throughput sequencing methods to provide insight into the transcriptome of a cell. Compared to previous Sanger sequencing- and microarray-based methods, RNA-Seq provides far higher coverage and greater resolution of the dynamic nature of the transcriptome.

What does RNA star do?

The STAR (Dobin et al, 2013) software package enables highly accurate and ultra-fast alignment of RNA-seq reads to a reference genome. In addition to detecting of annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA.

How long is star indexing?

STAR will insert the junctions into genome indices on the fly before mapping, which takes 1 2 minutes.

How is the star aligner used in RNA Seq?

STAR Aligner To determine where on the human genome our reads originated from, we will align our reads to the reference genome using STAR (Spliced Transcripts Alignment to a Reference). STAR is an aligner designed to specifically address many of the challenges of RNA-seq data mapping using a strategy to account for spliced alignments.

How to align reads to the genome using star?

The basic options for aligning reads to the genome using STAR are: Listed below are additional parameters that we will use in our command: NOTE: Default filtering is applied in which the maximum number of multiple alignments allowed for a read is set to 10. If a read exceeds this number there is no alignment output.

Why do we need to align RNA sequence data?

The theory behind aligning RNA sequence data is essentially the same as discussed earlier in the book, with one caveat: RNA sequences do not contain introns. Gene models in Eukaryotes contain introns which are often spliced out during transcription.

How does the alignment with Star algorithm work?

The algorithm achieves this highly efficient mapping by performing a two-step process: For every read that STAR aligns, STAR will search for the longest sequence that exactly matches one or more locations on the reference genome. These longest matching sequences are called the Maximal Mappable Prefixes (MMPs):