Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

SAM format summary

I wrote the following in my old blog back in 2012, but the SAM / BAM format is still the de facto standard for working with (aligned) sequence data…

The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences. It is a text format for storing sequence data in a series of tab delimited ASCII columns and is commonly used in next-generation sequencing data processing. It is the (non-binary) human-readable version of the BAM format and contains information about the read, the aligned position in the genome and information about its quality. It was developed by Heng Li (in Richard Durbin’s group at the Wellcome Trust Sanger Institute) and others, their paper is here.

After a header section the alignment section describes all results of the aligned read data. The format is best explained with an example line:

1:497:R:-272+13M17D24M 113 1 497 37 37M 15 100338662 0 CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG 0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>> XT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37

FieldnameDescriptionExample-data
QNAMEread name1:497:R:-272+13M17D24M
FLAGalignment flag113
RNAMEalignment chromosome1
POSalignment start position497
MAPQoverall mapping quality37
CIGARalignment CIGAR string37M
MRNM/RNEXTname of next alignm. in group (mate)15
MPOS/PNEXTpos. of next alignm. in group (mate)100338662
ISIZE/TLENobserved Template LENgth0
SEQsequenceCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG
QUALquality per base0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>>
TAGsfurther tags with alignment infoXT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37

The read name QNAME (at least from Illumina machines) are constructed as:

[instrument-name]:[run ID]:[flowcell ID]:[lane-number]:[tile-number]:
[x-pos]:[y-pos] [read number]:[is filtered]:[control number]:[barcode sequence]

for example: @M01117:25:000000000-A37B9:1:1101:14984:1386 1:N:0:4

To decode the meaning of the FLAGs in the above example and to allow filtering of reads using these flags there is a great page at the BROAD.

The TAGs are optional and might vary between alignment programs. Shown are examples from BWA. Important for filtering are usually the tags X0:i (numbers of genome alignments of this read) and XM:i (number of mismatches in alignment).

TAGMeaning
NMEdit distance
MDMismatching positions/bases
ASAlignment score
BCBarcode sequence
X0Number of best hits
X1Number of suboptimal hits found by BWA
XNNumber of ambiguous bases in the referenece
XMNumber of mismatches in the alignment
XONumber of gap opens
XGNumber of gap extentions
XTType: Unique/Repeat/N/Mate-sw
XAAlternative hits; format: (chr,pos,CIGAR,NM;)*
XSSuboptimal alignment score
XFSupport from forward/reverse alignment
XENumber of supporting seeds

Sources:
genome.sph.umich.ed with further useful details, full specs.
Image by PublicDomainPictures from Pixabay