Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

RNA-Seq data quality scores

There are different ways to encode the quality scores in FASTQ files from Next-generation sequencing machines. It is important to find out before using the data and to convert between formats if necessary.

  • Sanger format can encode a Phred quality score from 0 to 93 using ASCII characters 33 to 126 (although in raw read data the Phred quality score rarely exceeds 60, higher scores are possible in assemblies or read maps).
  • Illumina 1.3+ format can encode a Phred quality score from 0 to 62 using ASCII characters 64 to 126 (although in raw read data Phred scores from 0 to 40 only are expected).
  • Solexa/Illumina 1.0 format can encode a Solexa/Illumina quality score from -5 to 62 using ASCII characters 59 to 126 (although in raw read data Solexa scores from -5 to 40 only are expected)
  SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................
  ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX......................
  ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
  .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
  LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL....................................................
  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  |                         |    |        |                              |                     |
 33                        59   64       73                            104                   126


 S - Sanger        Phred+33,  raw reads typically (0, 40)
 X - Solexa        Solexa+64, raw reads typically (-5, 40)
 I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
 J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40)
    with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator
 L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)

Source: wikipedia

For a simple look-up from ASCII to numeric scores you can use the following list:

ASCII	numeric		ASCII	numeric
!	0		@	31
"	1		A	32
#	2		B	33
$	3		C	34
%	4		D	35
&	5		E	36
'	6		F	37
(	7		G	38
)	8		H	39
*	9		I	40
+	10		J	41
,	11		K	42
-	12		L	43
.	13		M	44
/	14		N	45
0	15		O	46
1	16		P	47
2	17		Q	48
3	18		R	49
4	19		S	50
5	20		T	51
6	21		U	52
7	22		V	53
8	23		W	54
9	24		X	55
:	25		Y	56
;	26		Z	57
<	27		[	58
=	28		\	59
>	29		]	60
?	30		^	61

You can convert the Solexa read quality to Sanger read quality with Maq:

maq sol2sanger s_1_sequence.txt s_1_sequence.fastq

where s_1_sequence.txt is the Solexa read sequence file. Missing this step will lead to unreliable SNP calling when aligning reads with Maq.

Source: maq-manual

Phred itself is a base calling program for DNA sequence traces developed during the initial automation phase of the sequencing of the human genome.
After calling bases, Phred examines the peaks around each base call to assign a quality score to each base call. Quality scores range from 4 to about 60, with higher values corresponding to higher quality. The quality scores are logarithmically linked to error probabilities, as shown in the following table:

Phred quality	Probability of		Accuracy of
score		wrong base call		base call
10 		1 in 10 		90%
20 		1 in 100 		99%
30 		1 in 1,000 		99.9%
40 		1 in 10,000 		99.99%
50 		1 in 100,000 		99.999%

“High quality bases” are usually scores of 20 and above (“Phred20 score”).

You can read the original publications about the Phred program and scoring by Brent Ewing et al. from Phil Green’s lab here and here.
Source: www.phrap.com