bioinformaticsgenome informatics

Coding Phases / Frames

The phase (or sometimes called frame) gives information on how to translate individual parts of a gene, the coding exons. Phases 1 & 2 have a different definition in GFF and EnsEMBL format!
In EnsEMBL, the phase is defined for exon objects like this:
The Ensembl phase convention can be thought of as “the number of bases of the first codon which are on the previous exon”. It is therefore 0, 1 or 2 (-1 means the exon is non-coding).
In ascii art, with alternate codons represented by ### and +++:

       Previous Exon   Intron   This Exon

    ...-------------            -------------...

    5'                  Phase                3'

    ...#+++###+++###     0      +++###+++###+...

    ...+++###+++###+     1      ++###+++###++...

    ...++###+++###++     2      +###+++###+++...

In the GFF format, the 8th column gives phase information for CDS features. The definition of phases is here:

For features of type “CDS”, the phase indicates where the feature (i.e. exon) begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon. In other words, a phase of “0” indicates that the next codon begins at the first base of the region described by the current line, a phase of “1” indicates that the next codon begins at the second base of this region, and a phase of “2” indicates that the codon begins at the third base of this region.
For forward strand features, phase is counted from the start field. For reverse strand features, phase is counted from the end field.

[Ref]

In effect, you can usually translate the phase from Ensembl to GFF-style like this:

  • 0 to 0
  • 1 to 2, the initial first base is added to last exon’s codon
  • 2 to 1, the initial first two bases are added to last exon’s codon

The DAS protocol defines the phase as the GFF format:
The tag indicates the position of the feature relative to open reading frame, if any. It may be one of the integers 0, 1 or 2, corresponding to each of the three reading frames, or “-” if the feature is unrelated to a reading frame.

[Some more infos on different formats]