Most of us take vaccinations for granted and rely on them from our very first days. The whooping cough as an example can be deadly, especially for young babies who are too young to be protected by their own vaccination. Since 2010, the Centers for Disease Control and Prevention (CDC) has recorded between 10,000 and 50,000 cases each year in the United States and up to 20 babies dying. One recent study showed that many whooping cough deaths among babies could be prevented if all babies received the first dose of vaccination on time at 2 months old, when they are old enough to get vaccinated (CDC). Still, some parents believe they know better and risk their children’s life by not vaccinating them at all.
For the US the CDC recommends vaccination of newborns / babies against the following diseases:
For Germany the situation is almost the same and the following vacciantions are recommended for babies under 2 years:
- Hib H. influenzae Typ b
- Hepatitis B
- Pertussis (Keuchhusten)
- Poliomyelitis (Kinderlaehmung)
- Varizellen (Windpocken)
- Meningokokken C
Sources: CDC, Robert-Koch-Institut
For any large software project (i.e. one that requires more than a few scripts performing a one-off task) and for every project that was initiated by a customer request, it is useful to precisely define the requirements before starting to write any code. This might be painful at times and slow down the coding fun, but it should avoid a lot of frustration on either side in the end.
Here is a short summary of what Software Requirements Specification (SRS) (IEEE 830) are, how to write them, what they are good for.
SRS is a complete description of the behavior of a system to be developed, including use cases.
The benefits of writing specifications when planning a software project are:
- Establish the basis for agreement between the customers and the suppliers on what the software product is to do.
- Reduce the development effort by avoiding redesign, recoding, and retesting and revealing omissions, misunderstandings, and inconsistencies early in the development cycle.
- Provide a basis for estimating costs and schedules.
- Provide a baseline for validation (comparison against what the customer needs) and verification (comparison with the formal specifications).
- Facilitate transfer to new users or new machines.
- Serve as a basis for enhancement.
Key points to address:
- Required functionality.
- External interfaces.
- Design constraints imposed on an implementation.
Avoid design details and coding details in the specs. Hardware requirements etc. go into general System Specifications, not SRS. The content and language of the document should fit the description with the following key words:
Complete, Consistent, Accurate, Modifiable, Ranked, Testable, Traceable, Unambiguous, Valid, Verifiable
Descriptions of “use cases”, mock-up GUI components and other visual aids are extremely useful to communicate with the parties involved.
As part of the Primary Analysis Illumina sequencing machines measure the intensity of the channels used for encoding the different bases and identify the most likely base at a given position of a sequencing read (tag). The Real Time Analysis (RTA) software writes the base and the confidence in the call as a quality score to base call (.bcl) files. As the name implies this is done in real time, i.e. for every cycle of the sequencing run a call for every location identified on the flow cell (tiles and lanes) is added. Bcl files are stored in binary format and represent the raw data output of a sequencing run. The format is described here. Software such as Casava/BclToFastq, Eland or the iSAAC aligner can make use of these files.
The *.bcl files are stored in the BaseCalls directory:
They are named in the format:
If you want to overcome errors during downstream processing from missing calls, software such as iSAAC and configureBclToFastq have an “–ignore-missing-bcl” command line option. This will interpret missing *.bcl files as no call (N) at that position.
Sources: Illumina, SeqAnswers
Sequence uniqueness within the genome plays an important part when attempting to map short sequence parts – e.g. next-generation short sequencing reads. It is one of the factors that can introduce a bias in sequencing or it’s analysis – the other important factor being GC content (GC-rich sequences, eg. genic/exonic region, as well as very GC-poor regions are often under-represented (Bentley et al. 2008), mainly caused by amplificatin steps in the protocol). Reads mapped to multiple regions are often discarded, genomic regions with high sequence degeneracy / low sequence complexity therefor show lower mapped read coverage than unique regions, creating a systematic bias.
The CRG Alignability tracks at the UCSC genome browser display how uniquely k-mer sequences align to a region of the genome. As you can see from the tracks, the mappability increases with read length:
CRG mappability tracks for different read lengths at the UCSC browser
For each window (of sizes 36, 40, 50, 75 or 100 nts), a mappability score was computed:
S = 1 / (number of matches found in the genome),
so S=1 means one match in the genome, S=0.5 is two matches in the genome, and so on. Further description in the publication of Thomas Derrien, Paolo Ribeca, et al. The data for these tracks can be downloaded, if you are working with other read lengths or genomes, you can run the software to generate the data yourself: Get the Gem library (latest version at GibHub), unpack it with
tar xbvf GEM-libraries-Linux-x86_64.tbz2, create an index:
gem-mappability -I gem_index -l 250 -o mappability_250.gem
run the mappability part, eg. with a read length of 250:
gem-mappability -I gem_index -l 250 -o mappability_250.gem
- Fast computation and applications of genome mappability.
Derrien T, et al. PLoS One. 2012
- The uniqueome: a mappability resource for short-tag sequencing. Koehler et al. Bioinformatics. 2011; 27(2): 272–274.
- Blog post at MassGenomics
- Systematic bias in high-throughput sequencing data and its correction by BEADS. Cheung et al. 2011
- Accurate Whole Human Genome Sequencing using Reversible
Terminator Chemistry. Bentley et al., Nature 2008