bioinformaticsgenome informatics

Genomic Start Coordinates

Adding to the confusion about different notations of phases/frames, the start coordinates of genomic features are also noted differently between different genome browsers and file formats.

  1. One-based
    Counting bases starting with “1” at the first position.
    Regions are specified by a “closed interval.” Used e.g. by the Ensembl genome browser and annotation system, the GFF/GTF, SAM and wiggle file formats.
  2. Zero-based
    The interbase system counts spaces starting with “0” at the first position.
    Regions are specified by a “half-closed-half-open interval”. Used by the UCSC genome browser, Chado (the fruitfly browser), the BED, BAM and PSL file formats.

An example:

  One-based
     1 2 3 4 5 6
     | | | | | |
     C G A T G C
    | | | | | | |
    0 1 2 3 4 5 6
  Zero-based

The ATG interval would be described from 3-5 in the first, from 2-5 in the second system.