Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Sequence Mappability & Alignability

Sequence uniqueness within the genome plays an important part when attempting to map short sequence parts – e.g. next-generation short sequencing reads. It is one of the factors that can introduce a bias in sequencing or it’s analysis – the other important factor being GC content (GC-rich sequences, eg. genic/exonic region, as well as very GC-poor regions are often under-represented (Bentley et al. 2008), mainly caused by amplificatin steps in the protocol). Reads mapped to multiple regions are often discarded, genomic regions with high sequence degeneracy / low sequence complexity therefor show lower mapped read coverage than unique regions, creating a systematic bias.

The CRG Alignability tracks at the UCSC genome browser display how uniquely k-mer sequences align to a region of the genome. As you can see from the tracks, the mappability increases with read length:

CRG mappability tracks for different read lengths at the UCSC browser

For each window (of sizes 36, 40, 50, 75 or 100 nts), a mappability score was computed:
S = 1 / (number of matches found in the genome),
so S=1 means one match in the genome, S=0.5 is two matches in the genome, and so on. Further description in the publication of Thomas Derrien, Paolo Ribeca, et al. The data for these tracks can be downloaded, if you are working with other read lengths or genomes, you can run the software to generate the data yourself: Get the Gem library (latest version at GibHub), unpack it with tar xbvf GEM-libraries-Linux-x86_64.tbz2, create an index:

gem-mappability -I gem_index -l 250 -o mappability_250.gem

run the mappability part, eg. with a read length of 250:

gem-mappability -I gem_index -l 250 -o mappability_250.gem

References:

  • Fast computation and applications of genome mappability. Derrien T, et al. PLoS One. 2012
  • The uniqueome: a mappability resource for short-tag sequencing. Koehler et al. Bioinformatics. 2011; 27(2): 272–274.
  • Blog post at MassGenomics
  • Systematic bias in high-throughput sequencing data and its correction by BEADS. Cheung et al. 2011
  • Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry. Bentley et al., Nature 2008