Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

CRAM format notes

CRAM files are compressed versions of BAM files containing (aligned) sequencing reads. They represent a further file size reduction for this type of data that is generated at ever increasing quantities. Where SAM files are human-readable text files optimized for short read storage, BAM files are their binary equivalent, and CRAM files are a restructured column-oriented binary container format for even more efficient storage.

Tke key components of the approach are that positions are encoded in a relative way (i.e., the difference between successive positions is stored rather than the absolute value) and stored as a Golomb code. Also, only differences to the reference genome are listed instead of the full sequence.

The compression rates achieved are shown in the graph below generated by Uppsala University:

Comparing speed: Using the C implementation of for CRAM (James K. Bonfield), decoding is 1.5–1.7× slower than generating BAM files, but 1.8–2.6× faster at encoding. (File size savings are reported at 34–55%.)

Additional compression can be achieved by reducing the granularity of the quality values which will result in lossy compression though. Illumina suggested a binning of Q scores without significant calling performance. 

Binning of similar Q-scores (Illumina):

Compression achieved by Q-score binning (Illumina):

Sources and further reading:

  1. Format definition and usage
  2. cram-toolkit
  3. Detailed report at the Uppsala University
  4. SAMtools with CRAM support
  5. Original article from Markus Hsi-Yang Fritz, Rasko Leinonen, Guy Cochrane and Ewan Birney
  6. Article about the implementation in C
  7. Illumina while paper on Qscore compression

Feature Image: Cram Vectors by Vecteezy