What is per sequence GC content?

What is per sequence GC content?

Summary. This module measures the GC content across the whole length of each sequence in a file and compares it to a modelled normal distribution of GC content.

What is the GC content of the human genome?

The average GC content of a 100-kb fragment of the human genome can be as low as 35% or as high as 60%, a range that is twice as wide as that typically observed in teleostean fishes, for instance (International Human Genome Sequencing Consortium 2001).

What is overrepresented sequence?

Overrepresented Sequences List of sequences which appear more than expected in the file. Only the first 50bp are considered. A sequence is considered overrepresented if it accounts for ≥ 0.1% of the total reads. Each overrepresented sequence is compared to a list of common contaminants to try to identify it.

What is the GC rule?

Reviewed on 6/3/2021. Chargaff rule: The rule that in DNA there is always equality in quantity between the bases A and T and between the bases G and C. (A is adenine, T is thymine, G is guanine, and C is cytosine.)

What happens when GC content is too high?

A high GC content will probably make your template much harder to amplify, but don’t despair, you can address this. To improve amplification, you may increase the annealing temperature, and/or add DMSO or add another secondary structure destabilizer to ensure that your GC rich template will be amplified.

What should I expect from the PER sequence GC content?

Per sequence GC content. Percent of bases at each position or bin with no base call, i.e. ‘N’. What to expect: You should never see any point where this curve rises noticeably above zero. If it does this indicates a problem occurred during the sequencing run.

How is GC content calculated in a random library?

This module measures the GC content across the whole length of each sequence in a file and compares it to a modelled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome.

What to look for in whole genome shotgun sequencing?

Plot of the number of reads vs. GC% per read. The displayed Theoretical Distribution assumes a uniform GC content for all reads. What to look for: For whole genome shotgun sequencing the expectation is that the GC content of all reads should form a normal distribution with the peak of the curve at the mean GC content for the organism sequenced.

How is the GC content of a genome calculated?

Since we don’t know the the GC content of the genome the modal GC content is calculated from the observed data and used to build a reference distribution. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset.