A DNA Sequencing History

Posted Posted in genomics, sequencing

Major landmarks in DNA sequencing and molecular biology

Strukturformel eines DNA-Ausschnittes (Wikipedia)

Discovery of the structure of the DNA double helix (Watson, Crick, Franklin).

Prove the semi-conservative nature of dna replication (Meselson, Stahl)

First dna triplet is decoded (Matthei, Nierenberg)

Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA.

The first gene is sequenced

The first complete DNA genome to be sequenced is that of bacteriophage φX174

Allan Maxam and Walter Gilbert publish “DNA sequencing by chemical degradation” [4].
Fred Sanger, independently, publishes “DNA sequencing by enzymatic synthesis”.

Fred Sanger and Wally Gilbert receive the Nobel Prize in Chemistry

Genbank starts as a public repository of DNA sequences.

Andre Marion and Sam Eletr from Hewlett Packard start Applied Biosystems in May, which comes to dominate automated sequencing.

Akiyoshi Wada proposes automated sequencing and gets support to build robots with help from Hitachi.

Restriction fragment length polymorphism fingerprinting (Jeffreys)

Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb.

Kary Mullis and colleagues develop the polymerase chain reaction, a technique to replicate small fragments of DNA

Leroy E. Hood’s laboratory at the California Institute of Technology and Smith announce the first semi-automated DNA sequencing machine.

Applied Biosystems markets this first automated sequencing machine, the model ABI 370.

Walter Gilbert leaves the U.S. National Research Council genome panel to start Genome Corp., with the goal of sequencing and commercializing the data.

The U.S. National Institutes of Health (NIS) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae (at 75 cents (US)/base).

BLAST algorithm for aligning sequences published (Lipman, Myers).

Capillary electrophoresis published (Barry Karger, Lloyd Smith, Norman Dovichi).

Official start of the Human Genome Project

Craig Venter develops strategy to find expressed genes with ESTs (Expressed Sequence Tags).

Uberbacher develops GRAIL, a gene-prediction program.

Craig Venter leaves NIH to set up The Institute for Genomic Research (TIGR).

William Haseltine heads Human Genome Sciences, to commercialize TIGR products.

Wellcome Trust begins participation in the Human Genome Project.

Simon et al. develop BACs (Bacterial Artificial Chromosomes) for cloning.

First chromosome physical maps published:
-Page et al. – Y chromosome[28];
-Cohen et al. chromosome 21[29].
-Lander – complete mouse genetic map[30];
-Weissenbach – complete human genetic map[31].

Wellcome Trust Sanger Institute (original file)

Wellcome Trust and MRC open Sanger Centre, near Cambridge, UK.

The GenBank database migrates from Los Alamos (DOE) to NCBI (NIH).

Venter, Fraser and Smith publish first sequence of free-living organism, Haemophilus influenzae (genome size of 1.8 Mb).

Richard Mathies et al. publish on sequencing dyes (PNAS, May)[32].

Michael Reeve and Carl Fuller, thermostable polymerase for sequencing[8].

International HGP partners agree to release sequence data into public databases within 24 hours.

International consortium releases genome sequence of yeast S. cerevisiae (genome size of 12.1 Mb).

Yoshihide Hayashizaki’s at RIKEN completes the first set of full-length mouse cDNAs.

Blattner, Plunkett et al. publish the sequence of E. coli (genome size of 5 Mb)[33]

First cloned animal, Sheep “Dolly”, is born (Wilmut)

Phil Green and Brent Ewing of Washington University publish ìphredî for interpreting sequencer data (in use since ë95)[34].

Venter starts new company (Celera), will sequence HG in 3 yrs for $300m.

Wellcome Trust doubles support for the HGP to $330 million for 1/3 of the sequencing.

NIH & DOE goal: “working draft” of the human genome by 2001.

Sulston, Waterston et al finish sequence of C. elegans (genome size of 97Mb)[35].

NIH moves up completion date for rough draft, to spring 2000.

NIH launches the mouse genome sequencing project.

First sequence of human chromosome 22 published[36].

Celera and collaborators sequence fruit fly Drosophila melanogaster (genome size of 180Mb) – validation of Venter’s shotgun method. HGP and Celera debate issues related to data release.

HGP consortium publishes sequence of chromosome 21.[37]

HGP & Celera jointly announce working drafts of HG sequence, promise joint publication.

Estimates for the number of genes in the human genome range from 35,000 to 120,000.

International consortium completes first plant sequence, Arabidopsis thaliana (genome size of 125 Mb).

HGP consortium publishes Human Genome Sequence draft in Nature (15 Feb)[38].

Celera publishes the Human Genome sequence[39].

HapMap project initiated to decipher human genetic variation

420,000 VariantSEQr human resequencing primer sequences published on new NCBI Probe database.

Genographic project launched to study human migration

A set of closely related species (12 Drosophilidae) are sequenced, launching the era of phylogenomics.

Craig Venter publishes his full diploid genome


Source: Wikipedia and ABI

Epigenetics and Epigenomics

Posted Posted in genomics

The human DNA sequence has been read, now we know how the genome works and how to detect and cure genetic diseases, don’t we?

Unfortunately not – or fortunately if you are working in this area. While we know the sequence of bases for a number of reference and other genomes, not only are we far from knowing and understanding all the variations that can be found between different people and the consequences of the variations – but there are also other layers of information in the genome that we are only starting to understand. I am talking about the field of epigenetics here, which looks at molecular “tags” that are attached to the DNA at certain places and play a key role in activation or deactivation of the genes in these places. In contrast to the actual DNA sequence these markers are reversible and get altered during embryonic development and differentiation, i.e. when cells develop into a specific cell types, e.g. a skin cell. They also get modified in a less fortunate way as we get old and in certain disease conditions such as diabetes, inflammation or cancer. The study of these tags  is called epigenetics, or epigenomics when applied to the entire human genome.

More specifically, these tags are molecular modifications, mostly methyl-groups that can be attached usually to the Cytosil DNA base and to histones, the proteins that the DNA is wrapped around to “get in shape”.

Sources and useful links:

Vaccination of newborns

Posted Posted in health

Most of us take vaccinations for granted and rely on them from our very first days. The whooping cough as an example can be deadly, especially for young babies who are too young to be protected by their own vaccination. Since 2010, the Centers for Disease Control and Prevention (CDC) has recorded between 10,000 and 50,000 cases each year in the United States and up to 20 babies dying. One recent study showed that many whooping cough deaths among babies could be prevented if all babies received the first dose of vaccination on time at 2 months old, when they are old enough to get vaccinated (CDC). Still, some parents believe they know better and risk their children’s life by not vaccinating them at all.

For the US the CDC recommends vaccination of newborns / babies against the following diseases:

For Germany the situation is almost the same and the following vacciantions are recommended for babies under 2 years:

  • Hib H. influenzae Typ b
  • Diphtherie
  • Hepatitis B
  • Masern
  • Mumps
  • Pertussis (Keuchhusten)
  • Pneumokokken
  • Poliomyelitis (Kinderlaehmung)
  • Röteln
  • Tetanus
  • Rotaviren
  • Varizellen (Windpocken)
  • Meningokokken C

Sources: CDCRobert-Koch-Institut

Genetic Conditions Screened in Newborns

Posted Posted in health, screening

As part of the health assessment of newborn babies, a test for common genetic conditions is done by drawing a few drops of blood from the heel of the baby and sending this off for analysis. Any positive results will then be followed up by confirmatory test and a treatment can be initiated if required. The conditions are mostly life-threatening or disabling for the child if undiagnosed or left untreated.Below is a list of conditions that are screened as part of the current standard panel of core conditions and secondary conditions in the US-american health system. Secondary conditions are results that will be additionally (unintentionally) revealed when testing for the core conditions. If desired there are even more options for testing (supplemental screening). What test are offered or paid for depends on the state and the insurance. This information is taken from babysfirsttest.org.

1. Metabolic Disorders


  • 2-Methyl-3-Hydroxybutyric Acidemia (2M3HBA)
  • 2-Methylbutyrylglycinuria (2MBG)
  • 3-Hydroxy-3-Methylglutaric Aciduria (HMG) *
  • 3-Methylcrotonyl-CoA Carboxylase Deficiency (3-MCC) *
  • 3-Methylglutaconic Aciduria (3MGA)
  • Beta-Ketothiolase Deficiency (BKT) *
  • Ethylmalonic Encephalopathy (EME)
  • Glutaric Acidemia, Type I (GA-1) *
  • Holocarboxylase Synthetase Deficiency (MCD)
  • Isobutyrylglycinuria (IBG)
  • Isovaleric Acidemia (IVA) *
  • Malonic Acidemia (MAL)
  • Methylmalonic Acidemia (Cobalamin Disorders) (Cbl A,B) *
  • Methylmalonic Acidemia (Methymalonyl-CoA Mutase Deficiency) (MUT) *
  • Methylmalonic Acidemia with Homocystinuria (Cbl C, D, F)
  • Propionic Acidemia (PROP) *


  • 2,4 Dienoyl-CoA Reductase Deficiency (DE RED)
  • Carnitine Acylcarnitine Translocase Deficiency (CACT)
  • Carnitine Palmitoyltransferase I Deficiency (CPT-IA)
  • Carnitine Palmitoyltransferase Type II Deficiency (CPT-II)
  • Carnitine Uptake Defect (CUD) *
  • Glutaric Acidemia, Type II (GA-2)
  • Long-Chain L-3 Hydroxyacyl-CoA Dehydrogenase Deficiency (LCHAD) *
  • Medium-Chain Acyl-CoA Dehydrogenase Deficiency (MCAD) *
  • Medium-Chain Ketoacyl-CoA Thiolase Deficiency (MCAT)
  • Medium/Short-Chain L-3 Hydroxyacyl-CoA Dehydrogenase Deficiency (M/SCHAD)
  • Short-Chain Acyl-CoA Dehydrogenase Deficiency (SCAD)
  • Trifunctional Protein Deficiency (TFP) *
  • Very Long-Chain Acyl-CoA Dehydrogenase Deficiency (VLCAD) *


  • Argininemia (ARG)
  • Argininosuccinic Aciduria (ASA) *
  • Benign Hyperphenylalaninemia (H-PHE)
  • Biopterin Defect in Cofactor Biosynthesis (BIOPT-BS)
  • Biopterin Defect in Cofactor Regeneration (BIOPT-REG)
  • Carbamoyl Phosphate Synthetase I Deficiency (CPS)
  • Citrullinemia, Type I (CIT) *
  • Citrullinemia, Type II (CIT II)
  • Classic Phenylketonuria (PKU) *
  • Homocystinuria (HCY) *
  • Hypermethioninemia (MET)
  • Hyperornithine with Gyrate Deficiency (Hyper ORN)
  • Maple Syrup Urine Disease (MSUD) *
  • Nonketotic Hyperglycinemia (NKH)
  • Ornithine Transcarbamylase Deficiency (OTC)
  • Prolinemia (PRO)
  • Tyrosinemia, Type I (TYR I) *
  • Tyrosinemia, Type II (TYR II)
  • Tyrosinemia, Type III (TYR III)


2. Endocrine Disorders

  • Congenital Adrenal Hyperplasia (CAH) *
  • Primary Congenital Hypothyroidism (CH) *


3. Hemoglobin Disorders

  • Glucose-6-Phosphate Dehydrogenase Deficiency (G6PD)
  • Hemoglobinopathies (Var Hb)
  • S, Beta-Thalassemia (Hb S/ßTh) *
  • S, C Disease (Hb S/C) *
  • Sickle Cell Anemia (Hb SS) *


4. Other Disorders

  • Adrenoleukodys-trophy (ALD)
  • Biotinidase Deficiency (BIOT) *
  • Classic Galactosemia (GALT) *
  • Congenital Toxoplasmosis (TOXO)
  • Critical Congenital Heart Disease (CCHD) *
  • Cystic Fibrosis (CF) *
  • Formiminoglutamic Acidemia (FIGLU)
  • Galactoepimerase Deficiency (GALE)
  • Galactokinase Deficiency (GALK)
  • Hearing loss (HEAR)
  • Human Immunodeficiency Virus (HIV)
  • Hyperornithinemia-Hyperammonemia-Homocitrullinuria Syndrome (HHH)
  • Pyroglutamic Acidemia (5-OXO)
  • Severe Combined Immunodeficiency (SCID) *
  • T-cell Related Lymphocyte Deficiencies


5. Lysosomal Storage Disorders

  • Fabry (FABRY)
  • Gaucher (GBA)
  • Krabbe
  • Mucopolysaccharidosis Type-I (MPS I)
  • Mucopolysaccharidosis Type-II (MPS II)
  • Niemann-Pick Disease (NPD)
  • Pompe (POMPE)

See more at: www.babysfirsttest.org

Software Requirements Specification

Posted Posted in bioinformatics, software

For any large software project (i.e. one that requires more than a few scripts performing a one-off task) and for every project that was initiated by a customer request, it is useful to precisely define the requirements before starting to write any code. This might be painful at times and slow down the coding fun, but it should avoid a lot of frustration on either side in the end.

Here is a short summary of what Software Requirements Specification (SRS) (IEEE 830) are, how to write them, what they are good for.

SRS is a complete description of the behavior of a system to be developed, including use cases.

The benefits of writing specifications when planning a software project are:

  • Establish the basis for agreement between the customers and the suppliers on what the software product is to do.
  • Reduce the development effort by avoiding redesign, recoding, and retesting and revealing omissions, misunderstandings, and inconsistencies early in the development cycle.
  • Provide a basis for estimating costs and schedules.
  • Provide a baseline for validation (comparison against what the customer needs) and verification (comparison with the formal specifications).
  • Facilitate transfer to new users or new machines.
  • Serve as a basis for enhancement.

Key points to address:

  • Required functionality.
  • External interfaces.
  • Performance.
  • Attributes.
  • Design constraints imposed on an implementation.

Avoid design details and coding details in the specs. Hardware requirements etc. go into general System Specifications, not SRS. The content and language of the document should fit the description with the following key words:

Complete, Consistent, Accurate, Modifiable, Ranked, Testable, Traceable, Unambiguous, Valid, Verifiable

Descriptions of “use cases”, mock-up GUI components and other visual aids are extremely useful to communicate with the parties involved.