How does a StringTie work?

How does a StringTie work?

StringTie uses a genome-guided transcriptome assembly approach along with concepts from de novo genome assembly to improve transcript assembly. Using a mapping of reads to the reference genome, genome-guided transcript assemblers cluster the reads and build graph models representing all possible isoforms for each gene.

What does StringTie merge do?

In the merge mode, StringTie takes as input a list of GTF/GFF files and merges/assembles these transcripts into a non-redundant set of transcripts. This mode is used in the new differential analysis pipeline to generate a global, unified set of transcripts (isoforms) across multiple RNA-Seq samples.

Why use StringTie?

When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph.

What is StringTie?

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus.

What are GTF files used for?

The Gene transfer format (GTF) is a file format used to hold information about gene structure. It is a tab-delimited text format based on the general feature format (GFF), but contains some additional conventions specific to gene information.

What is the difference between GTF and GFF?

GFF and GTF are TSV-based formats and in general have the same structure. The main difference is the underlying system/ontology for the annotation but also smaller differences in the format. GffFileIn allows to read GFF files in version 2 and 3 and GTF files. For writing, GffFileOut supports only GFF 3 and GTF.

Why You Should Use alignment independent quantification for RNA-seq?

In addition to the higher accuracy for the point expression estimates, the alignment-independent tools also allow the user to bootstrap the expression estimates to get an estimate of the technical variability associated with the point estimate.

How are gene abundances reported in StringTie file?

Gene abundances will be reported (tab delimited format) in the output file with the given name. StringTie outputs a file with the given name with all transcripts in the provided reference file that are fully covered by reads (requires -G ).

Which is an example of accounting for gene length?

Gene length: Accounting for gene length is necessary for comparing expression between different genes within the same sample. In the example, Gene X and Gene Y have similar levels of expression, but the number of reads mapped to Gene X would be many more than the number mapped to Gene Y because Gene X is longer.

What kind of gene transfer format does StringTie use?

The primary output of StringTie is a Gene Transfer Format (GTF) file that contains details of the transcripts that StringTie assembles from RNA-Seq data. GTF is an extension of GFF (Gene Finding Format, also called General Feature Format), and is very similar to GFF2 and GFF3.

How are reads related to gene length in dge?

Reads connected by dashed lines connect a read spanning an intron. Gene length: Accounting for gene length is necessary for comparing expression between different genes within the same sample.