Bioinformatics work notes

Navigating the Genome

The CCDS project

December 13, 2017

The Consensus CoDing Sequence (CCDS) project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations.
The IDs are helpful when referring to specific genes and annotation versions in publications as they can be tracked and found even after the underlying genome has changed.

Participating institutes:

European Bioinformatics Institute (EBI)
National Center for Biotechnology Information (NCBI)
Wellcome Trust Sanger Institute (WTSI)
University of California, Santa Cruz (UCSC)
HUGO Gene Nomenclature Committee (HGNC)
Mouse Genome Informatics (MGI)

Project page at NCBI

CCDS Identifiers and Tracking
Annotated genes are given a unique identifier number and version number (e.g. CCDS1.1, CCDS234.1). The version number will update if the CDS structure changes, or if the underlying genome sequence changes at that location. With annotation and sequence based genome browser update cycles, the CCDS set will be mapped forward, maintaining identifiers. All changes to existing CCDS genes are done by collaboration agreement; no single group will change the set unilaterally.

Image: NCBI, modified