Telomeric Regions of the Human Genome

Posted Posted in genome informatics, genomics

Telomeres form caps on the ends of chromosomes that prevent fusion of chromosomal ends and provide genomic stability.

During gametogenesis, reprogramming of the germ cells leads to elongation of telomeres up to their species-specific maximum.

In normal somatic cells, telomeres are progressively shortened with every cell division. This shortening in normal human cells limits the number of cell divisions. For human cells to proliferate beyond the senescence checkpoint, they need to stabilize telomere length. This is accomplished mainly by reactivation of the telomerase enzyme. Telomerase expression is under the control of many factors. Expression of telomerase can lead to cell immortalization and is activated during tumorigenesis, i.e. cancer.

Male Xq-telomeres are 1100 bp shorter than female Xq-telomeres.

The telomeric repeat found on all human chromosomes is “TTAGGG”.

The centromeres and telomeres of the human chromosomes are not defined as region attributes in the Ensembl perl API explicitely, so for checking these regions, one option is to pull them out of the UCSC table browser. For this, select the assembly and use the “Mapping and Sequencing tracks” group and the “Gap” table. The extracted locations of the human telomere regions is provided below for the genome assemblies GRCh37 (hg19) and GRCh38 (hg38). The coordinates are given in the 0-based UCSC coordinated system.

Telomeres of chromosome 17 have not been defined for assembly GRCh37. They are short, but do exists nonetheless. An assembly patch will address this.

ChromosomeStart (hg19)End (hg19)Start (hg38)End (hg38)
chr1010000010000
chr1249240621249250621248946422248956422
chr2010000010000
chr2243189373243199373242183529242193529
chr3010000010000
chr3198012430198022430198285559198295559
chr4010000010000
chr4191144276191154276190204555190214555
chr5010000010000
chr5180905260180915260181528259181538259
chr6010000010000
chr6171105067171115067170795979170805979
chr7010000010000
chr7159128663159138663159335973159345973
chr8010000010000
chr8146354022146364022145128636145138636
chr9010000010000
chr9141203431141213431138384717138394717
chr10010000010000
chr10135524747135534747133787422133797422
chr11010000010000
chr11134996516135006516135076622135086622
chr12010000010000
chr12133841895133851895133265309133275309
chr13010000010000
chr13115159878115169878114354328114364328
chr14010000010000
chr14107339540107349540107033718107043718
chr15010000010000
chr15102521392102531392101981189101991189
chr16010000010000
chr1690344753903547539032834590338345
chr17NANA010000
chr17NANA8324744183257441
chr18010000010000
chr1878067248780772488036328580373285
chr19010000010000
chr1959118983591289835860761658617616
chr20010000010000
chr2063015520630255206443416764444167
chr21010000010000
chr2148119895481298954669998346709983
chr22010000010000
chr2251294566513045665080846850818468
chrX010000010000
chrX155260560155270560156030895156040895
chrY010000010000
chrY59363566593735665721741557227415

Sources:

Read-counting for PGT-A

Posted Posted in bioinformatics, IVF, sequencing
ICSI, source: https://upload.wikimedia.org/wikipedia/commons/0/09/ICSI.jpg

This is an introductory article about Pre-Implantation Genetic Testing for Aneuploidy (PGT-A, formerly known as PGS) using sequencing-based read-counting. The procedure is simplified and I am avoiding to talk about specific companies or products here.

Increasing the chances of a successful pregnancy by looking at the embryos chromosomes

During an IVF cycle there are in most cases multiple embryos available for transfer to the woman hoping to become pregnant. IVF treatment is expensive and a stressful experience for the woman. The goal of the treatment must therefor be to have success after one or only a few cycles. Factors contributing to success or failure of an IVF cycle are various, many of which are still poorly understood. One of the best-understood factors is the genomic make-up of the embryo(s) transferred: If the cells have a non-standard number of chromosomes, the chances of survival are very low for the embryo. 

In order to select an embryo with a high chance of implantation and pregnancy, Pre-Implantation Genetic Testing for Aneuploidy (PGT-A) can be performed. This procedure can assess the genome of an embryo at a high level by identifying the number of chromosomes found in a few cells sampled. Deeper levels of genetic analysis are of course also possible, but the impact of these findings is usually poorly understood and would lead to a selection process that we should stay clear of!

Counting chromosomes by counting sequencing reads

Determining the copy-numbers of every chromosome can be reliably done with sequencing-based approaches nowadays. One way to achieve this is the read-counting method employed by many software solutions and commercial products. The method can be outlined as followed:

  1. Extract and amplify the DNA of a few embryonic cells
  2. Generate a sequencing library 
  3. Perform shallow-depth sequencing
  4. Align the sequencing reads back to the human reference genome
  5. Count how many reads you see in the different regions and chromosomes and
  6. Determine the copy-numbers from this measurement.

This is of course vastly simplified and cannot be discussed at every level of detail here, but we can focus on the last steps a bit further.
The theory we build on is that if the retrieval of the cells, the generation of the sequencing library and the experimental protocol are kept the same, the number of sequencing reads counted in a specific genomic region should only depend on two factors:

A. The characteristics of that genomic region: Some parts of the genome are amplified more rapidly than others and will be overrepresented in the mixture.
B. The amount of input material: If there was an extra copy of the chromosome we should have more DNA going into the sequencing process and also see more reads coming out for this specific region.

Factors underlying point A are mostly based on the characteristics of the amplification reaction mentioned in step 1 above (e.g. which polymerase enzyme is used) and the base-composition of the genome itself. The latter mostly comes into play when we try to place the sequencing reads found back on the reference genome: If the genomic region the read originates from is repeated a few times on different chromosomes, this alignment process cannot reliable determine where to place the read and it has to be discarded. However, as the genome is >99% identical between individual and because the amplification reaction should be performed in a standardised way, this type of bias is relatively constant. Point A above can therefor be analysed beforehand and this amplification and mappability bias can simply be removed.  This leaves us with the differences originating from point B – which are the actual differences that we are trying to assess in the embryo.

In a standard human cell there are 2 copies of each of the autosomes (chromosomes 1 to 22), this becomes our „normal“ level. If we look at the actual numbers of reads counted, we can derive the most likely copy-numbers of the region by comparing them to our normal level:

Theoretical numbers of reads0*50100150200
Assumed copy-number of the region01234

* There will usually be a few reads erroneously found in these regions

Even with very strict protocols there will be variation in read numbers though, making the the assessment more challenging. Additional processing steps can be applied to clean and smooth the data. For this to work it is assumed that most regions are at the “normal” level and that neighbouring measurement point behave similarly in most cases.

Visualising the measured copy-numbers for each region is a key part in the embryo assessment. Autosome read counts outside of the grey shaded area in the copy-number plot above would be considered gained / lost. The sample shown seems to have 3 copies of chromosome 16. Chromosomes X & Y show copy-number 1 as this would be a male sample.

This assessment can now guide the decision which of the embryos might have the highest change of survival bases on its genomic make-up and could be selected for implantation during an IVF cycle.

Microdeletion and Microduplication Syndromes in the Human Genome

Posted Posted in genome informatics, genomics, health

Small changes in the sequence of human chromosomes can have detrimental effects in the person’s health and development. The often affect multiple genes but are too small to be visible with traditional karyotyping methods. These changes seem to appear near regions of low complexity (repetitive sequence regions) with some consistency as novel mutations.

The following table shows a list of known syndromes and their genomic locations in hg18 (GRCh37) coordinates. It was compiled by A.Weise et al. in “Microdeletion and Microduplication Syndromes”, Journal of Histochemistry and Cytochemistry (2012).

Microdeletion SyndromeMicroduplication SyndromeOMIMCytobandChromosomeStart Position (kb)End Position (kb)Length (bp)
microdeletion 1p366078721pter-p36.311053095309
microdeletion 1p36 (GABRD)microduplication 1p36 (GABRD)6130601pter-p36.3101000010000
microduplication 1p34.11p34.1145591468081217
microdeletion 1p32.26137351p32.2155500609005400
microdeletion 1p21.31p21.3197320992501930
microdeletion 1q21.1microduplication 1q21.16124751q21.111449801463431363
thrombocytopenia-absent radius syndrome/TAR2740001q21.11144150144427277
deletion 1q21.1 (GJA5)duplication 1q21.1 (GJA5)1210131q21.11145040145860820
microdeletion 1q24q251q24.3q25.111701351720991964
microdeletion 1q24.31q24.31170000170600600
Van der Waude syndrome/VWS11193001q32.2-q411207709208277568
microdeletion 1q41–426125301q41-q421221135221775640
corpus callosum agenesis microdeletion6123371q441242576242936360
microduplication 2p25.32p25.3232503450200
Feingold syndrome/FS1642802p24.3215999160056
hypotonia-cystinuria syndrome/HCS6064072p212443844444258
holoprosencephaly 2/HPE21571702p21245022450264
microduplication 2p212p2124520045900700
NRXN1 microdeletionNRXN1 microduplication6005652p16.325001150437426
microdeletion 2p15–16.16125132p15–16.1257537615343997
microdeletion 2p14-p156125132p14–15263756653771621
microdeletion 2p11.2-p126135642p11.2-p12277597870919494
microdeletion 2q11.2 (LMAN2L, ARID5A)2q11.229609097040950
mesomelic dysplasia/MMD6052742q11.2299530100125595
microdeletion 2q11.2q13 (NCK2, FHL2)microduplication 2q11.2q13 (NCK2, FHL2)602633/6049302q11.2q1321000601078107750
nephronophthisis 1/NPHP1microduplication 2q11.2q132561002q13211029311032027
microdeletion 2q13microduplication 2q132q1321110501129501900
autism-dyslexia microdeletion 2q14.3microduplication 2q14.3 (own case)2q14.321245001255001000
Mowat–Wilson syndrome/MWS2357302q22.3214490014499494
microdeletion 2q23.11562002q23.12148964149150186
microdeletion 2q23.3q24.11562002q23.3-q24.121531501569303780
microdeletion 2q24.3neonatal epilepsy microduplication607208/6044032q24.2-q24.321651331665621429
synpolydactyly 1/SPD1microduplication 2q31.16136812q31.121766591776791020
microdeletion 2q31.2-q32.36123452q31.2-q32.2217764019138013740
microdeletion 2q33.16123132q33.121965382049158377
brachydactyly-mental retardation syndrome/BDMR6004302q3722396202429513331
distal 3p deletion6137923p25-p263069956995
Von Hippel Lindau disease/VHL1933003p25-p263101581016911
microdeletion 3p21.313p21.31349120522203100
microdeletion 3p14.1p136055153p14.1-p1337116471959795
microdeletion 3p11.1p12.13p11.2-p12.138706987408339
proximal 3q microdeletion syndrome3q13.11-q13.1231064001089002500
microdeletion 3q13.313q13.313115335115916581
blepharophimosis, ptosis, and epicanthus inversus syndrome/BPES1101003q2331401461401482
Dandy–Walker syndrome/DWS2202003q2431486101486177
microdeletion 3q27.3q293q27.3-q2931888701980809210
microdeletion 3q29microduplication 3q29609425/6119363q2931971261989821856
Wolf–Hirschhorn syndrome/WHSmicroduplication 4p16.31941904pter-p16.34020432043
microduplication 4p16.14p16.149450104501000
microdeletion 4p15.34p15.3416583207474164
microdeletion 4q21.21q21.226135094q21.21q21.22481950833501400
microdeletion 4q216135094q21482228836011373
microdeletion 4q21.2q21.34q21.2-q21.34891488921870
Parkinson disease/PARK1163890/1686014q22.149074791018271
Rieger type 1/RIEG11805004q25411175811177921
4q32.1-q32.2 Triple/Duplication syndrome6136034q32.1q32.241573561616154259
Cri–du-Chat syndrome/CdCS1234505p15.2-p15.33501177711777
Cornelia de Lange syndrome/CDLSNIPBL microduplication6131745p13.25369973703336
spinal muscular atrophy/SMA2533005q13.2570278702868
microdeletion 5q14.36006625q14.358614286413271
microdeletion 5q14.3-q156128815q14.3-q15588400900901690
familial adenomatous polyposis/FAP1751005q22.25112129112249120
adult-onset autosomal dominant leukodystrophy/ADLD1695005q23.25126046126233187
PITX1 microdeletion6021495q31.15134222134463241
microdeletion 5q31.35q31.351391171416822565
Pseudo trisomy 13 syndrome2644805q35.151702221715841362
microdeletion 5q35.15q35.151725921725953
parietal foramina/PFM1685005q35.251740841740917
Sotos syndromemicroduplication 5q351175505q35.2-q35.351750631773892326
microdeletion 6p6125826p2560
microdeletion 6p22.36p22.362085021250400
adrenal hyperplasia/AH2019106p21.32632114321173
microdeletion 6p21.316p21.3163327334086813
microdeletion 6q13–146135446q13–14672650763103660
Prader–Willi like1762706q16.2610094310101875
transient neonatal diabetes mellitus 1/TNDM16014106q24.26144303144427124
microdeletion 6q25.2-q25.36128636q25.2-q25.361555001588533353
PARK2 microdeletionPARK2 microduplication6025446q2661616881627841096
microdeletion 6q27 anosmiaChondroma/CHDM2154006q2761655541707625208
Saethre–Chotzen syndrome/SCS1014007p21.1719121
Greig cephalopolysyndactyly/GCPS1757007p14.174196742243276
Williams–Beuren syndrome/WBSmicroduplication 7q11.23609757/1940507q11.23771971742552284
WBS-distal deletion (RHBDD2, HIP1)6137297q11.23774800765001700
split hand/foot malformation 1/SHFM1183600/2206007q21.3795370966191249
microdeletion 7q22.1-q22.37q22.1-q22.371010401045603520
autism/dyslexia microdeletion 7q31.17q31.17110654111266612
speech-language-disorder 1/SPCH16020817q3171140851140905
holoprosencephaly 3/HPE31429457q36.3715528815529810
triphalangeal thumb polysyndactyly syndrome/TPTS1745007q36.37155836156425589
Currarino syndrome/CS1764507q36.371564901564966
microdeletion 8p23.1microduplication 8p23.11796138p23.188156118033647
microdeletion 8p21.28p21.2820750243903640
microdeletion 8p12p218p12p21824500313006800
microduplication 8q11.236109288q11.2385345054050600
CHARGE syndromemicroduplication 8q122148008q12.286175461942188
microdeletion 8q12.3q13.28q12.3-q13.2865450690203570
mesomelia-synostoses syndrome/MSS6003838q1387054170908367
microdeletion 8q21.116142308q21.1187738977929540
nablus mask-like facial syndrome/NMLFS6081568q21.3-q22.1893210979404730
microdeletion 8q22.2q22.38q22.2-q22.381006901045603870
Langer–Giedion syndrome/LGS1502308q24.118118881119193312
sex reversal syndrome 4/SRXY41542309p24.39010481048
monosomy 9p syndrome1581709pter-p22.3901616816168
microduplication 9q21.116135589q21.1197105171197146
microdeletion 9q22.3PTCH1 microduplication6013099q22.3994420991004680
holoprosencephaly 7/HPE76108289q22.329972849731935
nail-patella syndrome/NPS1612009q33.3912841712849982
early infantile epileptic encephalopathy 4/EIEE46121649q34.11912941412949581
microdeletion 9q34 (EHMT1)microduplication 9q34 (EHMT1)6070019q34.391369501402003250
subtelomere deletion 9q6102539q34.39139473140273800
hypoparathyroidism, sensorineural deafness, and renal disease/HDRS14625510p15108137815720
Di George syndrome 2/DGS260136210p12.3110211442117026
microdeletion 10q22-q23 (NRG3, GRID1)10q22-q231081655889847329
juvenile polyposis syndrome/JPS61224210q23.2-q23.3108867589613938
Split-Hand/Foot Malformation 3/SHFM324656010q24.3210102977103445468
microdeletion 10q25q2660962510q25q2610117098qter18319
Beckwith–Wiedemann syndrome/BWS—Silver Russell syndrome/SRS microdeletionBeckwith–Wiedemann syndrome/BWS—Silver Russell syndrome/SRS microduplication13065011p15.511286128643
WAGR syndromemicroduplication 11p13194072/61246911p13113176732467700
Potocki–Shaffer syndrome/PSS60122411p11.21143905460802175
spinocerebellar ataxia type 20/SCA2060868711q12.2q12.3116121061503293
microdeletion 11q14.111q14.1-q14.211863348634410
Jacobsen syndrome/JBS147791/18802511q23.3-qter1111540013445219052
microduplication 12p13.3112p13.311280508250200
microdeletion 12q1412q141263356669323576
nasal speech-hypothyroidism microdeletion/NSH12q15-q21.11268802701392632590
Noonan syndrome 1/NS116395012q24.11211134111143291
microdeletion 13q12 (CRYL1)microduplication 13q12 (CRYL1)13q12.11131971019910200
spastic ataxia Charlevoix–Saguenay/SACS27055013q12.121322336238071471
microdeletion 13q12.3-q13.160018513q12.3-q13.1133113731871734
retinoblastoma/RB161388413q14.2134777647954178
Hirschsprung disease 2/HSCR260015513q2213773697739122
holoprosencephaly5/HPE560963713q32.31399432994375
microdeletion 14q11.261345714q11.214209202094727
congenital Rett variant/CRVmicroduplication 14q1261345414q121428300300001700
microdeletion 14q22-q2360793214q22-q231453486602616775
autism spherocytosis microdeletion/ASC14q23.2-q23.3146392464471547
microdeletion 14q32.214q32.214994631005741111
microdeletion 15q11.2 (NIPA1)microduplication 15q11.2 (NIPA1)60814515q11.2152035020640290
Angelman syndrome Typ1/AS1microduplication 1510583015q11.2-q13.11520405262315826
Angelman syndrome Typ2/AS2microduplication 1510583015q11.2-q13.11521309262314922
Prader–Willi syndrome Typ 1/ PWS1microduplication 1517627015q11.2-q13.11520405262315826
Prader–Willi syndrome Typ 2/ PWS2microduplication 1517627015q11.2-q13.11521309262314922
microdeletion 15q13.3 (CHRNA7)microduplication 15q13.3 (CHRNA7)61200115q13.31528525304891964
microdeletion 15q1415q141533471350721601
deafness and male infertility syndrome/DMIS61110215q15.3154161341747134
microdeletion 15q2115q21154838248565183
microdeletion 15q24 (BBS4,NPTN, NE01)60190715q241570700722001500
microdeletion 15q24microduplication 15q2461340615q241572158739491791
orofacial clefting/OC61429415q24.3-q25.21576080803384258
microdeletion 15q2561429415q25158290083600700
microdeletion 15q26.115q26.1159110091600500
Fryns syndrome/FNS22985015q26.21592238965204282
microdeletion 15q26.2-qter15q26.2-qter15956001003394739
ATR-16-syndrome14175016p13.3160774774
tuberous sclerosis microdeletion syndrome/PKDTStuberous sclerosis microduplication60027316p13.3162038207941
Rubinstein–Taybi syndrome 1/RSTS1Rubinstein–Taybi-microduplication610543/61345816p13.3163762380139
microdeletion 16p13.1 (MYH11)microduplication 16p13.1 (MYH11)13290016p13.11614789162811492
microdeletion 16p11.2-p12.2microduplication 16p11.2-p12.261360416p11.2-p12.21621521289507429
microdeletion 16p12.1 (EEF2K,CDR2)microduplication 16p12.2 (EEF2K,CDR2)117340/60696816p12.1162185022370520
16q11.2 distal microdeletion (SH2B1)16q11.2 distal microduplication (SH2B1)16q11.2162868029020340
microdeletion 16p11.2 (TBX6)microduplication 16p11.2 (TBX6)602427/61191316p11.2162955130059508
microdeletion 16q11.2-q12.116q11.2-q12.1164540145579178
microdeletion 16q21-q2216q21-q2216656216569271
microdeletion 16q12.1-q12.216q12.1-q12.21648018527264708
microdeletion 16q24.160108916q24.11682908851532245
FANCA deletion22765016q24.316883928841119
Miller–Dieker syndrome/MDLSMiller–Dieker microduplication247200/61321517p13.317024922492
microdeletion 17p13.3 (YWHAE)microduplication 17p13.3 (YWHAE)247200/61321517p13.3172–31028702870
microdeletion 17p13.161377617p13.11774297937508
hereditary liability to pressure palsies/HNPPCharcot–Marie–Tooth 1A/CMT1A162500/11822017p121713855153751520
Smith–Magenis syndrome/SMSPotocki–Lupski syndrome/PTLS61088317p11.21716527204233896
neurofibromatosis 1/NF1microduplication NF161367517q111726102272431141
microdeletion 17q11.2-q1217q11.2-q121726280310304750
microdeletion 17q12a17q121731977331501173
renal cysts and diabetes syndrome/RCADmicroduplication 17q12b13792017q121731830333501520
Van Buchem disease/VBCH23910017q12-q211739187391925
microdeletion 17q21.3 (MAPT)microduplication 17q21.31 (MAPT)610443/61353317q21.3174098841566578
microdeletion 17q21.31-q21.3217q21.31-q21.321741769431131344
microdeletion 17q22-q23.217q22–23.21748300542005900
microduplication 17q23.1–23.2613355/61361817q23.1–23.21755457576932236
microdeletion 17q24.2-q24.317q24.2-q24.31761730656903960
carney complex syndrome 1/CNC116098017q24.2-q24.31763260655942334
microduplication 17q24.327885017q24.31765642668471205
holoprosencephaly 4/HPE414639018p11.3118344534483
proximal 18q microdeletion60180818q12.3-q21.11837500425005000
Pitt–Hopkins syndrome/PTHS61095418q21.1185108351282199
microdeletion 18q22.3-q2360784218q22.3-q231870474731112637
Sotos-like microduplication 19p13.219p13.2199107110941987
microdeletion 19p13.13microduplication 19p13.1361363819p13.13191279313104311
microdeletion 19p13.1219p13.12191411914439320
microdeletion 19p13.1119p13.111916485175541069
microdeletion 19q13.1161302619q13.111937300402002900
Diamond–Blackfan anemia/DBA10565019q13.219470564706711
microdeletion 20p12.311226120p12.32069077012105
Alagille syndrome 1/ALGS111845020p12201047810669191
microdeletion 20q13.13-q13.220q13.13-q13.22049760508401080
Albright hereditary osteodystrophy/AHO10358020q13.3220569005692020
microdeletion 20q13.3320q13.332061246623761130
microdeletion 21q21.121q21.1211995020250300
microduplication 21q21.321q21.3212596026470510
platelet disorder/PD60139921q22.12213474335343600
Down syndrome/DS19068521q22.132137300385021202
Cat-Eye syndrome/CES11547022p11.1-q11.212201697716977
Di George syndrome/CATCH22/DGSmicroduplication 22q11.2608363/14541022q11.21-q11.232216932206723740
distal microdeletion 22q11.2 (BCR, MAPK1)distal microduplication 22q11.2 (BCR, MAPK1)61186722q11.22220446220261580
neurofibromatosis 2 microdeletion syndrome10100022q12.222283302842595
Phelan–McDermid syndromemicroduplication 22q13 (SHANK3)60623222q13224944949691242
Leri–Weill dyschondrosteosis/LWD127300Xp22.33X0724724
X-Linked autism-2/AUTSX2300495Xp22.32-p22.31X58186157339
Steroid sulphatase deficiency/STS308100Xp22.31X645281281676
Kallmann syndrome 1/KAL1308700Xp22.31X84578660203
MIDAS syndrome309801Xp22.2X1103911659620
Nance–Horan syndrome/NHS302350Xp22.13X1685317768915
microdeletion Xp22.11300830Xp22.11X2292823309381
X-linked congenital adrenal hypoplasia/AHCDAX1 microduplication300679Xp21.2X30233302374
complex glycerol kinase/CGK300679Xp21.2X3023330659426
muscular dystrophy Duchenne/DMD310200Xp21.2X3244533268823
Xp11.3 deletion syndrome300578Xp11.3X4619346627434
Goltz syndrome/GS305600Xp11.23X482524826412
17-beta-hydroxysteroid dehydrogenase X/HSD300801Xp11.22X5346753730263
microduplication Xq12q13.1300127Xq12-q13.1X67435686331198
X inactivation specific transcript/XIST314670Xq13.2X7286373063200
Bruton agammaglobulinemia/XLA300755Xq22.1X1004901004977
microdeletion Xq22.2Pelizaeus–Merzbacher microduplication/PMD312080Xq22.2X102609103098489
microdeletion Xq22.3q23300194/303631Xq22.3-q23X1072141102393025
lymphoproliferative syndrome 1/XLP1308240Xq25X12330812333527
X-linked hypopituitarism/SRXX3300833Xq27.1X1394131394152
fragile site mental retardation 1/FMR1309550Xq27.3X14680114684039
microdeletion Xq28Xq28X147043147543500
Rett syndrome/RSMECP2 microduplication300475/300815/ 300845Xq28X152535153044509
sex-determining region Y/SRY480000Yp11.31Y271527161
AZFa microdeletion415000Yq11.21Y1293413664730
AZFb microdeletion415000Yq11.221- q11.223Y18698244755777
AZFb+c microdeletion415000Yq11.221-q11.23Y18474262037729
AZFc microdeletion415000Yq11.223-q11.23Y23387262032816

Website optimization

Posted Posted in web development

If you’re like me you hate waiting for websites to load and sometimes just hit the „back“ button if a site is just too slow. If that was your website you miss out on visitors and maybe even customers! What’s worse for you in this case: Your site will also be less visible in search engines like Google as there is a scoring part based on page speed!  During a webinar by Harald Köppe and SEO specialist freelancer Beatrice Köhler I learned the following tips to improve the likelihood of your page e.g. for your freelance consulting business being found in the internet.

The main tool used is PageSpeed Insights from Google itself which measures the loading speed and gives great ideas on how to improve specific parts of your web page. Using my own bioinformatics freelancer business page gene-test.com as an example (WordPress), the speed score went from a bad 25 to a good 90 after a few minutes of optimization:

Score before optimization
Before optimization

Score after optimization
After optimization

The main steps were:
1. Compress or convert images using the online software Squoosch.
2. Deactivate & remove unused WordPress modules, keep active ones up to date.
3. Use WordPress Plugin Autoptimize to e.g. automatically optimize CSS and JavaScript parts and to defer loading of images.

highlight the image compression options
Image optimization with Squoosh

There are many more improvement options of course, but this was a very quick and impressive way to get started!

CRAM format notes

Posted Posted in bioinformatics

CRAM files are compressed versions of BAM files containing (aligned) sequencing reads. They represent a further file size reduction for this type of data that is generated at ever increasing quantities. Where SAM files are human-readable text files optimized for short read storage, BAM files are their binary equivalent, and CRAM files are a restructured column-oriented binary container format for even more efficient storage.

Tke key components of the approach are that positions are encoded in a relative way (i.e., the difference between successive positions is stored rather than the absolute value) and stored as a Golomb code. Also, only differences to the reference genome are listed instead of the full sequence.

The compression rates achieved are shown in the graph below generated by Uppsala University:

File size comparisons of SAM, BAM, CRAM

Comparing speed: Using the C implementation of for CRAM (James K. Bonfield), decoding is 1.5–1.7× slower than generating BAM files, but 1.8–2.6× faster at encoding. (File size savings are reported at 34–55%.)

Additional compression can be achieved by reducing the granularity of the quality values which will result in lossy compression though. Illumina suggested a binning of Q scores without significant calling performance. 

Binning of similar Q-scores (Illumina):

qscore binning

Compression achieved by Q-score binning (Illumina):

qscore compression

Sources and further reading:

  1. Format definition and usage
  2. cram-toolkit
  3. Detailed report at the Uppsala University
  4. SAMtools with CRAM support
  5. Original article from Markus Hsi-Yang Fritz, Rasko Leinonen, Guy Cochrane and Ewan Birney
  6. Article about the implementation in C
  7. Illumina while paper on Qscore compression